*2.2.1 Data collection with commercial search engines*

The most popular web search engines, which are very popular besides their application in the webometrics, are Google, Yahoo, and Bing. Each of the web search engines uses its own algorithms for browsing and different techniques for indexing and browsing of the web. Actually, it means that if a user wants to enter an inquiry into a search engine in a form of, for example, "webometrics methodology," there is a huge probability that he/she would obtain different results from different search engines for the same browsed term. These algorithms applied by the web search engines are business secrets of corporations standing behind their implementation. Besides the abovementioned search engines, there are other search engines, but these three are the most popular due to the quality of obtained results and speed of browsing. In application of some of web engines, there are some keywords for browsing that may be entered so that obtained results could be filtered and oriented toward a searched term. For example, if one enters the term "site:untz.ba" in Google search engine, the inquiry will provide us all data related to that domain, its auxiliary subdomains, and all sites indexed by the browser.

Furthermore, if one enters a string in the form of "site:untz.ba <space> filetype:pdf," the browser would provide us all sites and subdomains containing documents of *Adobe Portable Document Format* (Adobe PDF) type and a direct link to the same. These examples are specifically applicable to Google search engine.

Web search engines of the Internet are very important in researching of the field of webometrics because their databases are a source of information that cover a great part of data of the web. Although commercial search engines are very important for surfing the Internet and data collection, they have some significant limitations, among which the following stand out:


Regardless of their limitations, commercial web search engines are one of the unique and best sources of information which are currently available but only for

### *Scientometrics Recent Advances*

certain types of webometrics researches. At the same time, they are not properly designed for the purpose necessary for the academic community, and results usually are not thorough enough, which would be a great need of this field [28].

If an interface of a web browser is directly used for browsing, data collection may be a very demanding process in regard to time. This problem may be overcome with an application of a special software based on application program interface developed by companies which create web search engines and other services on the web.

## *2.2.2 Web crawlers as a source of data*

Another important source of data is personal *web crawlers*. Among the most popular free-of-charge tools of this type, which are used to analyze links, are *SocSciBot* [21] and *LexiURL* [29]. Both of these *crawlers* are developed by Professor Mike Thelwall from the University of Wolverhampton, UK, in order to find alternative strategies and methods to analyze links. The essence of functioning of these tools is that they search for and download certain websites from the web and analyze them with an integrated analytical software, such as *Pajek* [30], *Ucinet* [31], *NetDraw* [32], etc., for the purpose of data analysis and creation of a graph of a network representing a scheme of data linking.

## *2.2.3 Challenges within the webometrics research field*

Webometrics functions on a principle of an analysis of academic and nonacademic articles. Academic documents include publications such as e-journals, e-books, patents, technical reports, etc. Nonacademic documents include websites—commercial ones, sites of social networks, etc.—published by individuals, blogs, and portals where there is not any (i.e., which process of publishing of contents does not comply with) *peer-reviewed* system. The greatest challenges within the webometrics research field are in finding relevant sources of data and in the development and implementation of techniques for their efficient collection. Among the four research fields within webometrics, link analysis has been in the focus more and more since most of the commercial web search engines canceled their support related to browsing the web contents which include link analysis. For these reasons, there is still a great need for alternative sources of data.

### **2.3 Alternative sources of data**

In the first decade of the twenty-first century, most of the web search engines supported the webometrics research filed with application of special keywords for search engines such as "site:domain," "linkdomain," "linkfromdomain," etc. Starting from 2012 there has been a great change which has reflected the field of usage of sources of data for webometrics researches as a *matter of policy* which was started by owners of the web search engines. As a result of the mentioned, most of the web search engines canceled their support to the webometrics. Researchers from the field of webometrics tended to find alternative sources of data to go on with their researches. A survey of some of the existing systems by which data may be collected for the needs of webometrics analysis is given hereafter.

*Alexa Internet* [33] was established in 1996. As a *search engine optimization* (SEO) tool, Alexa collects data on the basis of behavior of users on the web, while they visit some sites using their analytic tool. The data are analyzed in a manner to give information for a global ranking or ranking within a country. One also analyzes data related to web communication and a total number of sites which refer to a certain web domain, etc.

**25**

*Advantages and Disadvantages of the Webometrics Ranking System*

web domain increases depending on the quality of *backlinks*.

*Alexa Toolbar Service* [34] is a smaller software application which collects and stores information about websites, web domains, and other sites which this tool uses

In 2005 *Who.is* [35] become a web portal for searching for and collection of data

*Majestic SEO* tool [37] represents one of the best tools related to analyses of *backlink*, *incoming link*, *inbound link*, *inlink*, and *inward link*. Backlink for an assessed web resource is a link which shows a hyperlink from some other web location to the observed web location. A web resource may be a web host, a website, or a web directory. Backlinks are one of the indicators of popularity of a website, and they represent a very significant source of information. A rank or value of a site within a

*Searchmetrics* [38] is a professional SEO tool which enables a survey of all data related to visibility and social visibility of websites. Visibility of a site is analyzed through *PageRank* [39], which is a tool for analysis of *metatags*. Afterward one analyzes a server and a domain where a certain content is located (domain's age, domain's popularity, reverse IP addresses), if there are tools for analyses of links (popularity of a link, counter of backlinks, value of links, exit links). Visibility of social data is related to links related to social networks such as *Facebook*, *Twitter*,

*Ahrefs.com* [40] is a famous set of tools (Site Explorer [41], Content Explorer [42], Keywords Explorer [43], Rank Tracker [44], Site Audit [45]) for analyzing backlinks to the websites, and it is a very important tool for SEO analyses.

These days, the Internet has become the main source of scientific information, both for the academic community and for the society. The whole society has been turning to the Internet as a primary medium for presentation of information to the public. On that ground, the fact that web publications are a primary tool for communication within the educational system and that they reflect the complete picture of quality and performances of universities has become very important [51]. Bearing in mind the development of digital world, the influence of electronic publications is significantly greater than the influence of written media or printed versions of journals and books today. Websites are the cheapest and the most efficient way to stimulate all of the three academic missions: to educate, to research, and to transfer knowledge [51]. This fact is one of the main reasons why web data have been extensively used for evaluation, inter alia, of universities and research

Ranking is a process in which one defines positions of elements in a group in regard to a total system so that for any two elements in a sequence, the first one is ranked "as higher than," "as lower than," or "as equal to" the second item of a sequence [46]. Ranking process appears in many fields whether they are academic or of other type. In a case of academic space, ranking may be applied in different parts of academic space, starting from ranking of professors and ranking of researchers and research centers to ranking of universities. Ranking of universities is an especially

about web domains of any organization or institution. Who.is offered a unique tool to obtain information about IP addresses, locations of domains, DNS names of servers, information related to availability of domains, and information related to various organizations or universities which belonged or belong to those institutions. *Webconfs.com* [36] represents one of the additional tools which may be used as a

*DOI: http://dx.doi.org/10.5772/intechopen.87207*

to collect data regarding analyses of users.

source of data for webometrics researches.

*LinkedIn*, and *Google+*.

**3. University ranking systems**

institutions in the last couple of years.

interesting field of application of ranking.

### *Advantages and Disadvantages of the Webometrics Ranking System DOI: http://dx.doi.org/10.5772/intechopen.87207*

*Scientometrics Recent Advances*

services on the web.

*2.2.2 Web crawlers as a source of data*

network representing a scheme of data linking.

**2.3 Alternative sources of data**

*2.2.3 Challenges within the webometrics research field*

certain types of webometrics researches. At the same time, they are not properly designed for the purpose necessary for the academic community, and results usually

If an interface of a web browser is directly used for browsing, data collection may be a very demanding process in regard to time. This problem may be overcome with an application of a special software based on application program interface developed by companies which create web search engines and other

Another important source of data is personal *web crawlers*. Among the most popular free-of-charge tools of this type, which are used to analyze links, are *SocSciBot* [21] and *LexiURL* [29]. Both of these *crawlers* are developed by Professor Mike Thelwall from the University of Wolverhampton, UK, in order to find alternative strategies and methods to analyze links. The essence of functioning of these tools is that they search for and download certain websites from the web and analyze them with an integrated analytical software, such as *Pajek* [30], *Ucinet* [31], *NetDraw* [32], etc., for the purpose of data analysis and creation of a graph of a

Webometrics functions on a principle of an analysis of academic and nonacademic articles. Academic documents include publications such as e-journals, e-books, patents, technical reports, etc. Nonacademic documents include

websites—commercial ones, sites of social networks, etc.—published by individuals, blogs, and portals where there is not any (i.e., which process of publishing of contents does not comply with) *peer-reviewed* system. The greatest challenges within the webometrics research field are in finding relevant sources of data and in the development and implementation of techniques for their efficient collection. Among the four research fields within webometrics, link analysis has been in the focus more and more since most of the commercial web search engines canceled their support related to browsing the web contents which include link analysis. For

In the first decade of the twenty-first century, most of the web search engines supported the webometrics research filed with application of special keywords for search engines such as "site:domain," "linkdomain," "linkfromdomain," etc. Starting from 2012 there has been a great change which has reflected the field of usage of sources of data for webometrics researches as a *matter of policy* which was started by owners of the web search engines. As a result of the mentioned, most of the web search engines canceled their support to the webometrics. Researchers from the field of webometrics tended to find alternative sources of data to go on with their researches. A survey of some of the existing systems by which data may be collected

*Alexa Internet* [33] was established in 1996. As a *search engine optimization* (SEO) tool, Alexa collects data on the basis of behavior of users on the web, while they visit some sites using their analytic tool. The data are analyzed in a manner to give information for a global ranking or ranking within a country. One also analyzes data related to web com-

munication and a total number of sites which refer to a certain web domain, etc.

these reasons, there is still a great need for alternative sources of data.

for the needs of webometrics analysis is given hereafter.

are not thorough enough, which would be a great need of this field [28].

**24**

*Alexa Toolbar Service* [34] is a smaller software application which collects and stores information about websites, web domains, and other sites which this tool uses to collect data regarding analyses of users.

In 2005 *Who.is* [35] become a web portal for searching for and collection of data about web domains of any organization or institution. Who.is offered a unique tool to obtain information about IP addresses, locations of domains, DNS names of servers, information related to availability of domains, and information related to various organizations or universities which belonged or belong to those institutions.

*Webconfs.com* [36] represents one of the additional tools which may be used as a source of data for webometrics researches.

*Majestic SEO* tool [37] represents one of the best tools related to analyses of *backlink*, *incoming link*, *inbound link*, *inlink*, and *inward link*. Backlink for an assessed web resource is a link which shows a hyperlink from some other web location to the observed web location. A web resource may be a web host, a website, or a web directory. Backlinks are one of the indicators of popularity of a website, and they represent a very significant source of information. A rank or value of a site within a web domain increases depending on the quality of *backlinks*.

*Searchmetrics* [38] is a professional SEO tool which enables a survey of all data related to visibility and social visibility of websites. Visibility of a site is analyzed through *PageRank* [39], which is a tool for analysis of *metatags*. Afterward one analyzes a server and a domain where a certain content is located (domain's age, domain's popularity, reverse IP addresses), if there are tools for analyses of links (popularity of a link, counter of backlinks, value of links, exit links). Visibility of social data is related to links related to social networks such as *Facebook*, *Twitter*, *LinkedIn*, and *Google+*.

*Ahrefs.com* [40] is a famous set of tools (Site Explorer [41], Content Explorer [42], Keywords Explorer [43], Rank Tracker [44], Site Audit [45]) for analyzing backlinks to the websites, and it is a very important tool for SEO analyses.
