**2. Data**

In this chapter, we use Science Citation Index Expand (SCI-E) provided by Clarivate Analytics Co., Ltd. This dataset contains bibliographic information of scientific papers published from 1900 to the present. However, due to limited research budget of authors, we use the dataset from 1981 to 2015 in this chapter. This dataset contains 34,666,719 papers and 591,321,826 citations.

In this chapter, we denote the number of papers published in the year *t* as *n*(*t*). **Figure 1** depicts the change of *n*(*t*). In this figure, *n*(*t*) almost monotonically increased from 1981 to 2013 and decreased after 2013. However, this behavior of *n*(*t*) is fake. This is because the dataset was made at the beginning of 2016 and it partially contains papers published in 2014 and 2015. It takes a few years for all the papers to be included in SCI-E.

component. This largest connected component consists of 34,428,322 nodes which are 99.3% of the total number of papers contained in the dataset, and of 591,177,607 links which are 99.98% of the total number of citations contained in the dataset. In the following section, we

Progress of Studies of Citations and PageRank http://dx.doi.org/10.5772/intechopen.77389 215

In this chapter, we argue for the distribution of the citations and stochastic models which lead

focus on the largest connected component.

**Figure 2.** Distribution of the size of connected components.

**Figure 1.** Yearly change of the number of e-articles.

to the citation network.

**3. Distribution and dynamics of citations**

If we consider papers as nodes and regard citations from a citing paper to a cited paper as directed links, we can consider the dataset of citations as a directed network. We call such a network as the citation network. The citation network consists of many connected components. We denote the number of nodes contained in connected components as *c* and represent a frequency of *c* as *F*(*c*). **Figure 2** depicts *F*(*c*). We can find that there is the largest connected

**Figure 1.** Yearly change of the number of e-articles.

evaluation indices, citation networks, information visualization, and citing behaviors. A variety of new impact measures has been proposed based on social network analysis in sociology and of network science originated from physics, mathematics, and information science. Bollen [3] summarized 39 impact measures and investigated the correlation between them by using the principal component analysis. Then, Bollen [3] indicated that the notion of scientific impact is a multidimensional construct that cannot be adequately measured by any single indicator,

In this chapter, we focus on the Google's PageRank which is first proposed by Brin and Page [4] to obtain the list of useful web pages for queries by users. Thus, if we define the usefulness of web page as the number of links cited by the other web pages, the search engine should propose the list of portal sites, that is, popular web pages. Hence, this list is useless for web users. To overcome this problem, based on the concept of vote, Brin and Page [4] defined the usefulness of web pages as the number of votes from the linking web pages. In the algorithm of Google's PageRank, the number of ballets is proportional to the usefulness of the web page, that is, the useful web page has many ballets. As a result, the useful web page collects votes from the useful web pages. Thus, the Google's PageRank expresses the prestige of web pages. We consider

that this characteristic of Google's PageRank is valid for the case of citation network.

This chapter is organized as follows. In Section 2, we explain characteristics of dataset used in this chapter. The distribution of citation and the stochastic model of citation network are elucidated in Section 3. In Section 4, we introduce Google's PageRank and calculate it. We consider the correlation between citation and PageRank in Section 5. Section 6 is devoted to

In this chapter, we use Science Citation Index Expand (SCI-E) provided by Clarivate Analytics Co., Ltd. This dataset contains bibliographic information of scientific papers published from 1900 to the present. However, due to limited research budget of authors, we use the dataset from 1981 to 2015 in this chapter. This dataset contains 34,666,719 papers and 591,321,826

In this chapter, we denote the number of papers published in the year *t* as *n*(*t*). **Figure 1** depicts the change of *n*(*t*). In this figure, *n*(*t*) almost monotonically increased from 1981 to 2013 and decreased after 2013. However, this behavior of *n*(*t*) is fake. This is because the dataset was made at the beginning of 2016 and it partially contains papers published in 2014 and 2015. It

If we consider papers as nodes and regard citations from a citing paper to a cited paper as directed links, we can consider the dataset of citations as a directed network. We call such a network as the citation network. The citation network consists of many connected components. We denote the number of nodes contained in connected components as *c* and represent a frequency of *c* as *F*(*c*). **Figure 2** depicts *F*(*c*). We can find that there is the largest connected

takes a few years for all the papers to be included in SCI-E.

although some measures are more suitable than others.

conclusions.

214 Scientometrics

**2. Data**

citations.

**Figure 2.** Distribution of the size of connected components.

component. This largest connected component consists of 34,428,322 nodes which are 99.3% of the total number of papers contained in the dataset, and of 591,177,607 links which are 99.98% of the total number of citations contained in the dataset. In the following section, we focus on the largest connected component.

#### **3. Distribution and dynamics of citations**

In this chapter, we argue for the distribution of the citations and stochastic models which lead to the citation network.

### **3.1. Distribution**

The number of citations is represented by the number of in-degree, *k*, of the corresponding nodes. **Figure 3** is a double-logarithmic scale plot of the rank size distribution, *R*(*k*), of citations. The right-tail part of the distribution decreases almost monotonically. This means that this part follows a power-law distribution, that is, *R*(*k*) ∝ *k*<sup>−</sup>*<sup>μ</sup>* . Here, the exponent *μ* is called Pareto exponent originated in the name of Italian economist Vilfredo Pareto. The dashed line in **Figure 3** is the reference line which is the power law distribution with *μ* = 2, that is, *R*(*μ*) ∝ *k*<sup>−</sup><sup>2</sup> .

Albarrán and Ruiz-Castillo [10] studied 5 years (1998–2002) of publications in Web of Science (3.7 million papers) and found that the power law distributions of the right-tail part of the distribution of citation are not rejected for 17 of the 22 scientific fields of Web of Science. Albarrán et al. [11] investigated same dataset of Albarrán and Ruiz-Castillo [10] and found that the power law distributions of the right-tail part of the distribution of citation are not rejected for 140 of the 219 scientific sub-fields of Web of Science. Recently, Brzezinski [12] investigated scientific papers published between 1998 and 2002 drawn from Scopus and found that the

Although there are many researches besides the studies stated above, there are no studies that used vast amounts of data to approach the overall picture of citation distribution, like this chapter. The light gray line in **Figure 3** is the best fit by the generalized Beta distribution of the second kind (GB2) (or called the beta prime distribution) (e.g., see [13, 14]) with the

*ba <sup>B</sup>*(*μ*, *<sup>ν</sup>*) [<sup>1</sup> <sup>+</sup> (

number of citations at the beginning of January 2018. The characteristics of this list are that the subjects of papers are almost Biochemistry & Molecular Biology and that the publication

**) First author Title Journal, Year Subject**

RNA isolation by …

thermochemistry. 3…

G.M. Sheldrick A short history of SHELX Acta Crystallographica

the Colle-Salvetti correlation…

approximation…

the sensitivity of …

\_\_*k b*) *a* ] −(*μ*+*ν*)

*k*

Analytical Biochemistry,

Journal of Chemical Physics, 1993

Section A, 2008

1996

1994

Physical Review Letters,

Nucleic Acids Research,

Physical Review B, 1988 Physics

1987

, (1)

Progress of Studies of Citations and PageRank http://dx.doi.org/10.5772/intechopen.77389 217

is the rank of citation, *k* is the

Biochemistry & Molecular Biology; Chemistry

Chemistry; Physics

Chemistry; Crystallography

Biochemistry & Molecular Biology

Physics

, which is enclosed in parentheses, is the

power law hypothesis is rejected for half of the Scopus field of science.

probability density function:

years of papers are relatively old.

*rk k***(***k***'**

1 60,967 (62,404)

2 55,143 (65,452)

3 52,035 (61,637)

4 45,349 (64,127)

5 44,915 (64,682)

6 42,407 (46,286)

*<sup>f</sup>*(*k*; *<sup>a</sup>*, *<sup>b</sup>*, *<sup>μ</sup>*, *<sup>ν</sup>*) <sup>=</sup> *aka*−<sup>1</sup> \_\_\_\_\_\_\_\_

number of citations at the beginning of 2016, and *k*'

with *a* = 0.7, *b* = 15.2, *μ* = 2.0, *ν* = 3.0. Here, *B*(*μ*, *ν*) is the Beta function.

**Table 1** depicts the top 20 papers of citation. In this table, *r*

P. Chomczynski Single-step method of

A.D. Becke Density-functional

C.T. Leer Development of

J.P. Perdew Generalized gradient

J.D. Thompson Clustal-W – Improving

Pareto [5] first investigated the fat-tail behavior of the right-tail part of personal income and wealth distributions. After Pareto, many types of distribution functions have been mainly proposed in the field of economics, especially in the investigation of personal income distribution (e.g., see [6, 7]). On the other hand, in the field of scientometrics, Price [2] first applied the power law distribution to the citation network and found that the distribution of the number of citing (the number of out-going degree in terms of network science) follows the power law distribution with *μ* = 1 and that of the number of citations (the number of incoming degree in terms of network science) obeys the power law distribution with *μ* = 1.5 or *μ* = 2. The latter result is same as the reference line in **Figure 3**.

Rednar [8] investigated papers published in 1981 and cataloged by the Institute for Science Information (783,339 papers) and 20 years of publications in Physical Review D, vols. 11–50 (24,296 papers) and found that the right-tail part of both distributions of citation follows the power law distribution with *μ* = 2. This result is same as Price [2] and the reference line in **Figure 3**. Rednar [9] investigated 110 years (from July 1893 through June 2003) of publications in Physical Review, the topical journals Physical Review A-E, Physical Review Letters, Review of Modern Physics, and Physical Review Special Topics: Accelerators and Beam (353,268 papers and 3,110,839 citations) and found that the entire distribution of the number of citation follows a log-normal distribution.

**Figure 3.** Rank size distribution, *R*(*k*), of the number of citations, *k*.

Albarrán and Ruiz-Castillo [10] studied 5 years (1998–2002) of publications in Web of Science (3.7 million papers) and found that the power law distributions of the right-tail part of the distribution of citation are not rejected for 17 of the 22 scientific fields of Web of Science. Albarrán et al. [11] investigated same dataset of Albarrán and Ruiz-Castillo [10] and found that the power law distributions of the right-tail part of the distribution of citation are not rejected for 140 of the 219 scientific sub-fields of Web of Science. Recently, Brzezinski [12] investigated scientific papers published between 1998 and 2002 drawn from Scopus and found that the power law hypothesis is rejected for half of the Scopus field of science.

Although there are many researches besides the studies stated above, there are no studies that used vast amounts of data to approach the overall picture of citation distribution, like this chapter. The light gray line in **Figure 3** is the best fit by the generalized Beta distribution of the second kind (GB2) (or called the beta prime distribution) (e.g., see [13, 14]) with the probability density function:

$$f(k; a, b \,\, \mu, \nu) = \frac{ak^{\mu \nu + 1}}{b^{\mu \nu} \overline{B(\mu, \nu)}} \left[ 1 + \left( \frac{k}{b} \right)^{a} \right]^{-(\mu \nu)}\tag{1}$$

with *a* = 0.7, *b* = 15.2, *μ* = 2.0, *ν* = 3.0. Here, *B*(*μ*, *ν*) is the Beta function.

**3.1. Distribution**

216 Scientometrics

The number of citations is represented by the number of in-degree, *k*, of the corresponding nodes. **Figure 3** is a double-logarithmic scale plot of the rank size distribution, *R*(*k*), of citations. The right-tail part of the distribution decreases almost monotonically. This means that

Pareto exponent originated in the name of Italian economist Vilfredo Pareto. The dashed line in **Figure 3** is the reference line which is the power law distribution with *μ* = 2, that is, *R*(*μ*) ∝ *k*<sup>−</sup><sup>2</sup>

Pareto [5] first investigated the fat-tail behavior of the right-tail part of personal income and wealth distributions. After Pareto, many types of distribution functions have been mainly proposed in the field of economics, especially in the investigation of personal income distribution (e.g., see [6, 7]). On the other hand, in the field of scientometrics, Price [2] first applied the power law distribution to the citation network and found that the distribution of the number of citing (the number of out-going degree in terms of network science) follows the power law distribution with *μ* = 1 and that of the number of citations (the number of incoming degree in terms of network science) obeys the power law distribution with *μ* = 1.5 or *μ* = 2. The latter

Rednar [8] investigated papers published in 1981 and cataloged by the Institute for Science Information (783,339 papers) and 20 years of publications in Physical Review D, vols. 11–50 (24,296 papers) and found that the right-tail part of both distributions of citation follows the power law distribution with *μ* = 2. This result is same as Price [2] and the reference line in **Figure 3**. Rednar [9] investigated 110 years (from July 1893 through June 2003) of publications in Physical Review, the topical journals Physical Review A-E, Physical Review Letters, Review of Modern Physics, and Physical Review Special Topics: Accelerators and Beam (353,268 papers and 3,110,839 citations) and found that the entire distribution of the number

. Here, the exponent *μ* is called

.

this part follows a power-law distribution, that is, *R*(*k*) ∝ *k*<sup>−</sup>*<sup>μ</sup>*

result is same as the reference line in **Figure 3**.

of citation follows a log-normal distribution.

**Figure 3.** Rank size distribution, *R*(*k*), of the number of citations, *k*.

**Table 1** depicts the top 20 papers of citation. In this table, *r k* is the rank of citation, *k* is the number of citations at the beginning of 2016, and *k*' , which is enclosed in parentheses, is the number of citations at the beginning of January 2018. The characteristics of this list are that the subjects of papers are almost Biochemistry & Molecular Biology and that the publication years of papers are relatively old.



**3.2. Stochastic models**

time evolution of *k*

Here, *Ai*

we divide *k*

of Δ *k i* *i*

*i*

*dk*

is obtained by

Δ *ki* = *Ai ki*

By these manipulations, Eq. (3) is written by

the power law distribution with *μ* = 1/(1 − *γ*).

Simon [15] proposed the stochastic model, the so-called Simon's model, to elucidate the empirical distributions: distribution of words in prose samples by their frequency of occurrence, distributions of scientists by number of papers published, distributions of cities by population, distributions of income by size, and distributions of biological genera by number of species. Although assumptions of Simon's model are written in terms of word frequencies, we can express them in terms of network science as follows: assumption I—The probability that a node gets new link is proportional to the number of its degrees, that is, rich get richer or Matthew effect (e.g., see [16]), and assumption II—We add a new node with a constant probability *γ*. Simon's model elucidates the fact that the right-tail part of the distribution follows

Price [17] generalized Simon's model, the so-called Price's model, to explain the growth of the citation networks. Barabáshi and Albert [18] introduced the stochastic model, the so-called BA model, based on two concepts: preferential attachment and growth, which corresponds to assumptions I and II of Simon's model, respectively. BA model is the case of *γ* = 1/2 of Simon's model and derives the power law distribution with *μ* = 2. Jeong et al. [19] extended BA model to include an aging effect and a class of homogeneous connection kernels. Golosovsky and

Here, we use the model proposed by Jeong et al. [19] and check the aging effect and homogeneity of the growth of citation network. If we denote the number of degree of node *i* as *k*

(*t*) *ki*

(*t*) is an aging factor and *α* > 0 is an unknown scaling exponent. Krapivsky et al. [22]

have shown, for the case without the aging factor, for *α* = 1 (linear preferential attachment) the model is just same as BA model and derives the power law distribution with*μ* = 2. For *α* < 1, the model derives the stretched exponential distribution, and for *α* > 1 (super preferential

We investigate the dynamics of growth for 44,932 papers published in 1985. The left panel of **Figure 4** depicts the double-logarithmic scale scatter plot of the number of citations, *k*

into bins with logarithmically equal separation, *k*¯ and calculate the average value

for each bin, *k*¯, we obtain the red dots which are depicted in the right pane of **Figure 4**.

*i* , the

*i*

, from 1988 to 1999. If

*<sup>α</sup>*. (2)

Progress of Studies of Citations and PageRank http://dx.doi.org/10.5772/intechopen.77389 219

*<sup>α</sup>*, (3)

*i*

Solomon [20, 21] further extended to include an effect of initial attractivity.

\_\_\_*i dt* <sup>=</sup> *Ai*

attachment) a single node connects to nearly all other nodes, akin to gelation.

If we discretize the model and consider Δ*t* = 1 year, Eq. (2) is written by

(*i* = 1, 2, …,44932), as of 1988 and the change of the number of citations, Δ *k*

**Table 1.** Top 20 papers of citation.

### **3.2. Stochastic models**

*rk k***(***k***'**

218 Scientometrics

7 39,281 (44,765)

8 37,133 (48,832)

9 36,988 (56,581)

10 32,657 (37,653)

11 30,032 (33,046)

12 29,615 (34,235)

13 25,987 (29,094)

14 25,880 (33,287)

15 25,696 (29,809)

16 25,340 (30,673)

17 24,308 (28,923)

18 23,894 (34,400)

19 23,294 (27,062)

20 21,456 (21,529)

**Table 1.** Top 20 papers of citation.

**) First author Title Journal, Year Subject**

Nucleic Acids Research,

Journal of Molecular Biology, 1990

Molecular Biology and Evolution, 1987

Physical Review A, 1988 Physics

Nucleic Acids Research,

Journal of Personality and Social Psychology,

Macromolecular Crystallography,

1997

1997

1986

Journal of Immunological Methods, 1983

Methods, 2001 Biochemistry &

Biochemistry & Molecular Biology

Biochemistry & Molecular Biology

Molecular Biology

Biochemistry & Molecular Biology; Evolutionary Biology; Genetics & Heredity

Biochemistry & Molecular Biology

Biochemistry & Molecular Biology

Psychology

Medicine

Biochemistry & Molecular Biology; Immunology

Other Topics

& Ecology;

Evolutionary Biology; Genetics & Heredity

Biochemistry & Molecular Biology; Chemistry

Lancet, 1986 General & Internal

Nature, 1991 Science & Technology -

Evolution, 1985 Environmental Sciences

Physical Review B, 1996 Physics

Analytical Biochemistry,

1983

1997

S.F. Altschul Gapped BLAST and PSI-

S.F Altschul Basic local alignment search tool

K.J. Livak Analysis of relative gene

N. Saitou The neighbor-joining

Z. Otwinowski, Processing of X-ray

A.D. Beckead Density-functional

J.D. Thompson, The CLUSTAL\_X

R.M. Baron The moderator mediator

J.M. Bland Statistical methods for

T. Mosmann Rapid colorimetric assay for cellular …

S. Iijima Helical microtubules of

using …

G. Kresse Efficient iterative

J. Felsenstein Confidence-limits

A.P. Feinberg A technique for

BLAST: a new…

expression data…

method—A new …

diffraction data collected…

exchange-energy …

windows interface: flexible…

variable distinction…

assessing agreement…

graphitic carbon

schemes for ab initio total-energy calculations

on phylogenies – an approach using the bootstrap

radiolabeling DNA restriction endonuclease

fragments …

Simon [15] proposed the stochastic model, the so-called Simon's model, to elucidate the empirical distributions: distribution of words in prose samples by their frequency of occurrence, distributions of scientists by number of papers published, distributions of cities by population, distributions of income by size, and distributions of biological genera by number of species. Although assumptions of Simon's model are written in terms of word frequencies, we can express them in terms of network science as follows: assumption I—The probability that a node gets new link is proportional to the number of its degrees, that is, rich get richer or Matthew effect (e.g., see [16]), and assumption II—We add a new node with a constant probability *γ*. Simon's model elucidates the fact that the right-tail part of the distribution follows the power law distribution with *μ* = 1/(1 − *γ*).

Price [17] generalized Simon's model, the so-called Price's model, to explain the growth of the citation networks. Barabáshi and Albert [18] introduced the stochastic model, the so-called BA model, based on two concepts: preferential attachment and growth, which corresponds to assumptions I and II of Simon's model, respectively. BA model is the case of *γ* = 1/2 of Simon's model and derives the power law distribution with *μ* = 2. Jeong et al. [19] extended BA model to include an aging effect and a class of homogeneous connection kernels. Golosovsky and Solomon [20, 21] further extended to include an effect of initial attractivity.

Here, we use the model proposed by Jeong et al. [19] and check the aging effect and homogeneity of the growth of citation network. If we denote the number of degree of node *i* as *k i* , the time evolution of *k i* is obtained by

$$\frac{dk\_i}{dt} = A\_i(t)k\_i^a. \tag{2}$$

Here, *Ai* (*t*) is an aging factor and *α* > 0 is an unknown scaling exponent. Krapivsky et al. [22] have shown, for the case without the aging factor, for *α* = 1 (linear preferential attachment) the model is just same as BA model and derives the power law distribution with*μ* = 2. For *α* < 1, the model derives the stretched exponential distribution, and for *α* > 1 (super preferential attachment) a single node connects to nearly all other nodes, akin to gelation.

If we discretize the model and consider Δ*t* = 1 year, Eq. (2) is written by

$$
\Delta k\_{\text{l}} = A\_{\text{l}} k\_{\text{l}}^a \,. \tag{3}
$$

We investigate the dynamics of growth for 44,932 papers published in 1985. The left panel of **Figure 4** depicts the double-logarithmic scale scatter plot of the number of citations, *k i* (*i* = 1, 2, …,44932), as of 1988 and the change of the number of citations, Δ *k i* , from 1988 to 1999. If we divide *k i* into bins with logarithmically equal separation, *k*¯ and calculate the average value of Δ *k i* for each bin, *k*¯, we obtain the red dots which are depicted in the right pane of **Figure 4**. By these manipulations, Eq. (3) is written by

**Figure 4.** Left: Correlation between the number of citations and increase of the number of citations. Right: Change of the relation between mean citation and mean difference of citation.

**Figure 5.** Left: Change of the aging effect. Right: Change of homogeneous factor.

$$
\overline{\Delta k}^\* = A\_i \overline{k}^a. \tag{4}
$$

*Gi* = (1 − *d*) ∑

**Figure 7.** Correlation between the PageRank, *<sup>r</sup>*

**Figure 6.** Rank size distribution, *R*(*G*), of the Google number, *G*.

*i* nn *j* \_\_ *Gj kj* + \_\_*<sup>d</sup>*

*<sup>G</sup>*, in the case of *d* = 0.5 and *d* = 0.15.

Here, *N* = 34428322 is the total number of articles contained in the largest connected component of the citation network. The sum is over the neighboring nodes *j* in which a link points to node *i*. In Eq. (5), *d* is a free parameter that controls the convergence and effectiveness

*<sup>N</sup>*. (5)

Progress of Studies of Citations and PageRank http://dx.doi.org/10.5772/intechopen.77389 221

The red and solid line in the right panel of **Figure 4** corresponds to the linear regression of red dots by Eq. (4). The slope of this line corresponds to *α* and the intercept of it corresponds to *At* . In **Figure 4**, blue, green, and magenta dots are analysis for the year 1993, 2003, and 2010, respectively.

The left panel of **Figure 5** depicts the change of *At* . The solid line in this figure corresponds to the regression by the power law function given by*At* <sup>∝</sup> *<sup>t</sup>* <sup>−</sup>1.15. The right panel of **Figure 5** depicts the change of *α*. This figure shows that *α* > 1 for the entire period in which we investigated. From this analysis, we realize that the citation network has the characteristics of super preferential attachment; therefore, it is expected that a single node connects to nearly all other nodes. However, the aging effect prevents the citation network from an oligopolistic network.

#### **4. Distribution of PageRank**

Google's PageRank is proposed by Brin and Page [4]. The Google number, *Gi* , of paper *i* is defined by the recursion formula (from Chen et al. [23]):

**Figure 6.** Rank size distribution, *R*(*G*), of the Google number, *G*.

<sup>Δ</sup>¯

relation between mean citation and mean difference of citation.

**Figure 5.** Left: Change of the aging effect. Right: Change of homogeneous factor.

The left panel of **Figure 5** depicts the change of *At*

**4. Distribution of PageRank**

regression by the power law function given by*At* <sup>∝</sup> *<sup>t</sup>*

defined by the recursion formula (from Chen et al. [23]):

aging effect prevents the citation network from an oligopolistic network.

Google's PageRank is proposed by Brin and Page [4]. The Google number, *Gi*

to *At*

220 Scientometrics

respectively.

*k* = *At k*¯*<sup>α</sup>*. (4)

. The solid line in this figure corresponds to the

<sup>−</sup>1.15. The right panel of **Figure 5** depicts the

, of paper *i* is

The red and solid line in the right panel of **Figure 4** corresponds to the linear regression of red dots by Eq. (4). The slope of this line corresponds to *α* and the intercept of it corresponds

**Figure 4.** Left: Correlation between the number of citations and increase of the number of citations. Right: Change of the

change of *α*. This figure shows that *α* > 1 for the entire period in which we investigated. From this analysis, we realize that the citation network has the characteristics of super preferential attachment; therefore, it is expected that a single node connects to nearly all other nodes. However, the

. In **Figure 4**, blue, green, and magenta dots are analysis for the year 1993, 2003, and 2010,

**Figure 7.** Correlation between the PageRank, *<sup>r</sup> <sup>G</sup>*, in the case of *d* = 0.5 and *d* = 0.15.

$$G\_{\parallel} = \text{(1} - d) \sum\_{i \text{nd} \neq} \frac{G\_{\parallel}}{k\_{\parallel}} + \frac{d}{N}. \tag{5}$$

Here, *N* = 34428322 is the total number of articles contained in the largest connected component of the citation network. The sum is over the neighboring nodes *j* in which a link points to node *i*. In Eq. (5), *d* is a free parameter that controls the convergence and effectiveness of the recursion calculation. In the original Google's PageRank [4], *d* = 0.15 is adopted and appropriate for the case of world wide web. On the other hand, *d* = 0.5 is adopted in [23] and appropriate for the case of citation network.

**Figure 6** depicts the double-logarithmic scale plot of the rank size distribution of Google number, *R*(*G*). In this figure, filled circles correspond to the case of *d* = 0.5 and open squares correspond to that of *d* = 0.15. The dashed line in this figure is the reference line and represents the power law distribution with *μ* = 2. This value of exponent is same as the case of distribution of citation as depicted in **Figure 3**. Although the rank size distribution of Google number depends on *d*, the Google's PageRank, *r G* , is almost the same as depicted in **Figure 7**. This figure is the double-logarithmic scale plot of *r G* , and the abscissa is *r G* in the case of *d* = 0.5, and the ordinate is *r G* in the case of *d* = 0.15.

**Table 2** depicts the top 20 lists of the Google's PageRank. The characteristics of this list are that papers belong to many subjects and that the publication years of papers are relatively old.


**5. Correlation between citation and PageRank**

*rG G***(10−<sup>5</sup>**

**)** *rk k***(***k***'**

(17,990)

(33,046)

(11,590)

(46,286)

(48,832)

(44,765)

(37,653)

(28,923)

(10,827)

(12,850)

(8849)

**Table 2.** Top 20 papers of Google's PageRank.

10 2.3890 46 14,128

11 2.3430 11 30,032

12 2.3236 97 10,368

13 2.2868 6 42,407

14 2.1787 8 37,133

15 2.1481 7 39,281

16 2.0319 10 32,657

17 1.9081 17 24,308

18 1.8685 107 9775

19 1.8001 82 11,242

20 1.7796 129 8818

Bollen and Rodriquez [24] described that the Institute for Scientific Information (ISI) Impact factor (IF) which is defined as the mean number of citations a journal receives over a two-year

**)** *rk* **/***rG* **First author Title Journal, Year Subject**

Science, 1983 Science &

Progress of Studies of Citations and PageRank http://dx.doi.org/10.5772/intechopen.77389

> Macromolecular Crystallography,

1997

1987

Journal of the Chemical Society-Perkin Transactions 2,

Nucleic Acids Research, 1994

Journal of Molecular Biology,

Nucleic Acids Research, 1997

Molecular Biology and Evolution,

Crystallographica Section A, 1983

Journal of Applied Crystallography,

Crystallographica Section A, 1983

Nature, 1991 Science &

1990

1987

Acta

2003

Acta

Technology - Other

223

Biochemistry & Molecular Biology

Biochemistry & Molecular Biology

Biochemistry & Molecular Biology

Biochemistry & Molecular Biology

Biochemistry & Molecular Biology; Evolutionary Biology; Genetics & Heredity

Technology - Other

Topics

Chemistry; Crystallography

Chemistry; Crystallography

Chemistry; Crystallography

Chemistry

Topics

simulated annealing

diffraction data…

determined by X-RAY…

improving the sensitivity of…

search tool

PSI-BLAST: a new…

method – a new method…

of graphitic carbon

polarity estimation

structure validation with the…

for correcting…

4.6 S. Kirkpatrick Optimization by

1 Z. Otwinowski Processing of X-ray

8.08 F.H. Allen Table of bond lengths

0.57 S.F. Altschul Basic local alignment

0.47 S.F. Altschul Gapped BLAST and

0.63 N. Saitou The neighbor-joining

1 S. Iijima Helical microtubules

5.94 H.D. Flack On enantiomorph-

6 s.45 N. Walker An empirical-method

4.32 A.L. Spek Single-crystal

0.56 J.D. Thompson Clustal-W -


**Table 2.** Top 20 papers of Google's PageRank.

of the recursion calculation. In the original Google's PageRank [4], *d* = 0.15 is adopted and appropriate for the case of world wide web. On the other hand, *d* = 0.5 is adopted in [23] and

**Figure 6** depicts the double-logarithmic scale plot of the rank size distribution of Google number, *R*(*G*). In this figure, filled circles correspond to the case of *d* = 0.5 and open squares correspond to that of *d* = 0.15. The dashed line in this figure is the reference line and represents the power law distribution with *μ* = 2. This value of exponent is same as the case of distribution of citation as depicted in **Figure 3**. Although the rank size distribution of Google number

, is almost the same as depicted in **Figure 7**. This fig-

in the case of *d* = 0.5, and the

Chemistry; Crystallography

Chemistry; Crystallography

Chemistry; Physics

Computer Science

Computer Science

Lancet, 1986 General & Internal

Medicine

Physics

Physics

Biochemistry & Molecular Biology; Chemistry

*G*

Acta

1987

Acta

1993

Journal of Chemical Physics,

Analytical Biochemistry,

Crystallographica Section A, 2008

Crystallographica Section A, 1990

IEEE International Conference, 1995

Physical Review B, 1988

International Journal of computer Vision,

Physical Review Letters, 1996

2004

*G*

4 G.M. Sheldrick A short history of

0.5 P. Chomczynski Single-step method

8.67 G.M. Sheldrick Phase annealing in

0.5 A.D. Becke Density-functional

12.8 J. Kennedy Particle swarm

2.5 J.M. Bland Statistical methods

0.43 C.T. Lee Development of

9.5 D.G. Lowe Distinctive image

0.56 J.P Perdew Generalized gradient

*G*

**Table 2** depicts the top 20 lists of the Google's PageRank. The characteristics of this list are that papers belong to many subjects and that the publication years of papers are relatively old.

SHELX

acid…

**)** *rk* **/***rG* **First author Title Journal, Year Subject**

of RNA isolation by

SHELX-90 – direct methods for…

thermochemistry. 3…

optimization

for assessing agreement…

the Colle-Salvetti correlation…

features from scale-invariant…

approximation made…

, and the abscissa is *r*

appropriate for the case of citation network.

depends on *d*, the Google's PageRank, *r*

ordinate is *r*

222 Scientometrics

*rG G***(10−<sup>5</sup>**

*G*

**)** *rk k***(***k***'**

(64,127)

(62,404)

(18,789)

(65,452)

(14,640)

(29,809)

(61,637)

(18,640)

(64,682)

1 7.1314 4 45,349

2 3.4074 1 60,967

3 3.1210 26 18,109

4 2.8852 2 55,143

5 2.8578 64 12,824

6 2.7879 15 25,696

7 2.6547 3 52,035

8 2.5745 76 11,685

9 2.4425 5 44,915

ure is the double-logarithmic scale plot of *r*

in the case of *d* = 0.15.
