**2.2 Webometrics tools for collection of data from the Internet**

Web tools such as search engines, web crawlers, and webometrics software which are used for collecting data from the web are called *Webometrics tools* [5].

The area of research in the field of webometrics can, in a wider sense, be divided into the following segments:


To analyze data for the needs of webometrics, it is very important to know the source of information for each of the mentioned categories of webometrics. The main role of web browsers is to grasp relevant information on the basis of specific inquiries from various (heterogenous) sources of information.

Basically, there are two categories of sources of information which can be used in the research of webometrics:


Web search engines are computer programs which, on the basis of special algorithms, find appropriate information on the web, index them, and place them into databases appropriated for those purposes.

From the point of view of the webometrics research, web search engines can fundamentally be divided into two categories:


Web search engines, such as Google, Yahoo, and Bing, enable users to access a vast quantity of information related to contents and structure of links on the web free of charge. Web browsers collect information in a similar manner as web crawlers which are used by users to collect linked data. Basically, web browsers contain three different parts, crawler, indexer, and interface, in which one enters inquiries with terms to be browsed [10]. Led by this fact, authors Aquillo et al. [11] applied advanced options of web browsers to collect data from the web for the needs of ranking of universities.

Web crawlers are programs with the main objective to collect data from precisely defined web locations. They function in the following way: they start collecting data from a certain web location, and then they apply links contained by that web location so that a web crawler could move automatically and independently to a next processed web location, from one site to another until there are several links to be monitored and analyzed.

Regardless of the existence of some additional tools for analysis of links such as *LinkDiscoverer* [20], *SocSciBot* [21], and *Webometric Analyst/LexiURL Searcher* [22], Thelwall and Sud [23] underlined that researchers still depended on *application programming interface* (API) of commercial browsers for collection of raw data for their webometric studies. These API functions enable automatic data collection and enable programmers to encrypt programs with which one can access results of browsing. Yahoo canceled its *free-of-charge* support for usage of API functions for the purpose of browsing, Google has limited access to its API from 2011, and Bing also has limited a *free-of-charge* access to API 2.0 from 2012. This essentially canceled or significantly limited possibilities to collect important information for

**23**

tion [4].

*Advantages and Disadvantages of the Webometrics Ranking System*

extensive researches within the field of webometrics. Although web browsers have a very important role in data collection, none of them is able to collect data from the whole web. The web is a dynamic environment, and there are fluctuations in the

Generally speaking, one can say that web crawlers are an essentially better tool

The most popular web search engines, which are very popular besides their application in the webometrics, are Google, Yahoo, and Bing. Each of the web search engines uses its own algorithms for browsing and different techniques for indexing and browsing of the web. Actually, it means that if a user wants to enter an inquiry into a search engine in a form of, for example, "webometrics methodology," there is a huge probability that he/she would obtain different results from different search engines for the same browsed term. These algorithms applied by the web search engines are business secrets of corporations standing behind their implementation. Besides the abovementioned search engines, there are other search engines, but these three are the most popular due to the quality of obtained results and speed of browsing. In application of some of web engines, there are some keywords for browsing that may be entered so that obtained results could be filtered and oriented toward a searched term. For example, if one enters the term "site:untz.ba" in Google search engine, the inquiry will provide us all data related to that domain, its auxiliary subdomains, and all sites indexed by the browser. Furthermore, if one enters a string in the form of "site:untz.ba <space> filetype:pdf," the browser would provide us all sites and subdomains containing documents of *Adobe Portable Document Format* (Adobe PDF) type and a direct link to the same. These examples are specifically applicable to Google search engine.

Web search engines of the Internet are very important in researching of the field

• Systems for ranking of results of browsing eliminate similar or identical sites in results of browsing with an objective to eliminate useless information [25, 26].

• Algorithms that search engines use for surfing the web and generation of reports are corporative business secrets, and, therefore, an exact criterium for collection, sorting, and ranking information by importance is not known [19].

• A total result obtained in a search by a web search engine is assessed by time necessary for a search rather than by thoroughness and going into details into accurateness of data since they apply an algorithm which performs prioritiza-

Regardless of their limitations, commercial web search engines are one of the unique and best sources of information which are currently available but only for

of webometrics because their databases are a source of information that cover a great part of data of the web. Although commercial search engines are very important for surfing the Internet and data collection, they have some significant limita-

• Web search engines do not index the whole web space [24, 10, 19].

• Results may be conditioned by a national or a language area [27].

• Results may fluctuate and change from time to time.

than web browsers if one talks about researches about the webometrics.

*DOI: http://dx.doi.org/10.5772/intechopen.87207*

*2.2.1 Data collection with commercial search engines*

tions, among which the following stand out:

results obtained by browsing.

*Scientometrics Recent Advances*

• Analysis of contents of websites

• Analysis of application of web contents

inquiries from various (heterogenous) sources of information.

To analyze data for the needs of webometrics, it is very important to know the source of information for each of the mentioned categories of webometrics. The main role of web browsers is to grasp relevant information on the basis of specific

Basically, there are two categories of sources of information which can be used in

Web search engines are computer programs which, on the basis of special algorithms, find appropriate information on the web, index them, and place them

From the point of view of the webometrics research, web search engines can

• Web search engines which support searches related to the field of webometrics

• Web search engines of a general type which do not have any additional capacity

Web search engines, such as Google, Yahoo, and Bing, enable users to access a vast quantity of information related to contents and structure of links on the web free of charge. Web browsers collect information in a similar manner as web crawlers which are used by users to collect linked data. Basically, web browsers contain three different parts, crawler, indexer, and interface, in which one enters inquiries with terms to be browsed [10]. Led by this fact, authors Aquillo et al. [11] applied advanced options of web browsers to collect data from the web for the needs of ranking of universities.

Web crawlers are programs with the main objective to collect data from precisely

Regardless of the existence of some additional tools for analysis of links such as *LinkDiscoverer* [20], *SocSciBot* [21], and *Webometric Analyst/LexiURL Searcher* [22], Thelwall and Sud [23] underlined that researchers still depended on *application programming interface* (API) of commercial browsers for collection of raw data for their webometric studies. These API functions enable automatic data collection and enable programmers to encrypt programs with which one can access results of browsing. Yahoo canceled its *free-of-charge* support for usage of API functions for the purpose of browsing, Google has limited access to its API from 2011, and Bing also has limited a *free-of-charge* access to API 2.0 from 2012. This essentially canceled or significantly limited possibilities to collect important information for

defined web locations. They function in the following way: they start collecting data from a certain web location, and then they apply links contained by that web location so that a web crawler could move automatically and independently to a next processed web location, from one site to another until there are several links to

to direct searches toward terms related to the field of webometrics

• Analysis of structure of web links

• Commercial web search engines

into databases appropriated for those purposes.

fundamentally be divided into two categories:

• Analysis of web technologies

the research of webometrics:

• Personal web crawlers

be monitored and analyzed.

**22**

extensive researches within the field of webometrics. Although web browsers have a very important role in data collection, none of them is able to collect data from the whole web. The web is a dynamic environment, and there are fluctuations in the results obtained by browsing.

Generally speaking, one can say that web crawlers are an essentially better tool than web browsers if one talks about researches about the webometrics.
