**5. Correlation between citation and PageRank**

Bollen and Rodriquez [24] described that the Institute for Scientific Information (ISI) Impact factor (IF) which is defined as the mean number of citations a journal receives over a two-year period is a metric of popularity and that the Google's PageRank is a metric of prestige. This concept is also proposed by Chen et al. [23] and Maslov and Redner [25] which investigated all publications in the Physical Review family of journals from 1893 to 2003 and found the linear relation between the Google number and the number of citations. Furthermore, [23, 25] found that some outliers from this linear relation, especially the papers of which the ranking of PageRank is remarkably high and that of citation is slightly high, are universally familiar to physicists [23, 25] called such papers scientific "gems." Ma et al. [26] applied the concept of [23–25] to the field of biochemistry and molecular biology from 2000 to 2005. Though these studies investigated the citation network of some selected scientific field, this chapter investigates the citation network consisting of all scientific fields.

*rG G***(10−<sup>5</sup>**

**)** *rk k***(***k***'**

(14,640)

(7458)

(6605)

(6276)

(5402)

(4707)

(4463)

(2839)

(3879)

(5968)

(3961)

(3702)

(4075)

5 2.8578 64 12,824

22 1.6861 240 6500

25 1.5103 516 4465

33 1.4160 481 4611

36 1.3169 784 3740

43 1.2155 998 3309

48 1.1432 828 3656

49 1.1387 1916 2441

53 1.1102 1247 2991

60 1.0626 608 4149

76 0.9431 967 3360

79 0.9388 1758 2535

90 0.8884 1190 3048

**)** *rk* **/***rG* **First author Title Journal, Year Subject**

Proceedings of IEEE International Conference, 1995

IEEE Journal on Selected Areas in Communications,

1998

Computer Networks, 2002

Journal of Computer & Information Sciences, 1982

IEEE

Communications Magazine, 2002

IEEE Transactions on Information Theory, 2000

Knowledge Acquisition, 1993

IEEE-ACM Transactions on Networking, 1993

IEEE Journal on Selected Areas in Communications,

IEEE Journal on Selected Areas in Communications,

Proceedings of the IEEE, 1989

IEEE Transactions on Wireless Communications,

IEEE Transactions on Information Theory, 2000

2000

2005

2002

Computer Science

225

Progress of Studies of Citations and PageRank http://dx.doi.org/10.5772/intechopen.77389

> Engineering; Telecommunications

> Computer Science; Engineering; Telecommunications

> Information Science & Library Science

> Engineering; Telecommunications

> Computer Science; Information Science & Library Science

Computer Science; Engineering

Computer Science; Engineering; Telecommunications

Engineering; Telecommunications

Engineering; Telecommunications

Engineering

Engineering; Telecommunications

Computer Science; Engineering

optimization

networks: a survey

sensor networks

wireless networks

approach to portable ontology…

detection gateways for congestion…

analysis of the IEEE 802.11 distributed…

Brain-empowered wireless…

properties, analysis and applications

An applicationspecific protocol architecture for…

information flow

14.58 Z. Pawlak Rough sets International

transmit diversity technique for wireless…

12.8 J. Kennedy Particle swarm

20.64 I.F. Akyildiz Wireless sensor

21.78 I.F. Akyildiz A survey on

23.21 T.R. Gruber A translation

17.25 P. Gupta The capacity of

39.10 S. Floyd Random early

23.53 G. Bianchi Performance

10.13 S. Haykin Cognitive radio:

12.72 T. Murata Petri nets -

13.22 R. Ahlswede Network

22.25 W.B. Heinzelman

10.91 S.M. Alamouti A simple

**Figure 8** depicts the double-logarithmic scale plot of the correlation between the number of citations, *k*, and the Google number, *G*. In this figure, the solid gray line represents the mean value 〈*G*〉 calculated for bins of *k* with logarithmically equal width. This figure shows that 〈*G*〉 versus *k* is smooth and increases linearly with *k* for *k* ≥ 500. Thus, the Google number and citations are almost similar measures characterizing the importance of papers. This result means that prestige (Google number) is proportional to popularity (citations) in many cases.

However, there are outliers which have high prestige comparing to popularity. These papers are located above the solid gray line in **Figure 8** and are regarded as extremely prestigious papers. If we denote the citation rank as *r k* and the Google's PageRank as *r G* , these extremely prestigious papers are extracted by the order of Google's PageRank with the constraint given by the ratio *r k* /*r G* . **Table 3** depicts the top 20 extremely prestigious papers selected by using the constraint *r k* /*r <sup>G</sup>* <sup>&</sup>gt; 10. The characteristic of this list is that the subjects of papers are almost information science.

**Figure 8.** Correlation between the number of citations, *k*, and the Google number, *G*.


period is a metric of popularity and that the Google's PageRank is a metric of prestige. This concept is also proposed by Chen et al. [23] and Maslov and Redner [25] which investigated all publications in the Physical Review family of journals from 1893 to 2003 and found the linear relation between the Google number and the number of citations. Furthermore, [23, 25] found that some outliers from this linear relation, especially the papers of which the ranking of PageRank is remarkably high and that of citation is slightly high, are universally familiar to physicists [23, 25] called such papers scientific "gems." Ma et al. [26] applied the concept of [23–25] to the field of biochemistry and molecular biology from 2000 to 2005. Though these studies investigated the citation network of some selected scientific field, this chapter investi-

**Figure 8** depicts the double-logarithmic scale plot of the correlation between the number of citations, *k*, and the Google number, *G*. In this figure, the solid gray line represents the mean value 〈*G*〉 calculated for bins of *k* with logarithmically equal width. This figure shows that 〈*G*〉 versus *k* is smooth and increases linearly with *k* for *k* ≥ 500. Thus, the Google number and citations are almost similar measures characterizing the importance of papers. This result means

However, there are outliers which have high prestige comparing to popularity. These papers are located above the solid gray line in **Figure 8** and are regarded as extremely prestigious papers.

papers are extracted by the order of Google's PageRank with the constraint given by the ratio

and the Google's PageRank as *r*

. **Table 3** depicts the top 20 extremely prestigious papers selected by using the constraint

*<sup>G</sup>* <sup>&</sup>gt; 10. The characteristic of this list is that the subjects of papers are almost information science.

*G*

, these extremely prestigious

that prestige (Google number) is proportional to popularity (citations) in many cases.

gates the citation network consisting of all scientific fields.

*k*

**Figure 8.** Correlation between the number of citations, *k*, and the Google number, *G*.

If we denote the citation rank as *r*

*r k* /*r G*

224 Scientometrics

*r k* /*r*


**Table 3.** Top 20 extremely prestigious papers.


Cell Biology

*rk k***(***k***'**

375 5147 (6894)

414 4912 (5825)

419 4895 (5350)

535 4382 (4604)

543 4343 (5864)

547 4327 (4633)

600 4164 (5479)

611 4144 (4888)

640 4063 (5604)

645 4054 (5303)

657 4026 (4967)

661 4005 (4335)

706 3912 (5288)

**)** *rG G***(10−<sup>5</sup>**

**)** *rG***/***rk* **First author Title Journal, Year Subject**

Cell, 2005 Biochemistry

Progress of Studies of Citations and PageRank http://dx.doi.org/10.5772/intechopen.77389

Science 2001 Science &

Cell, 1997 Biochemistry

Science, 1995 Science &

Cell, 1993 Biochemistry

Science, 1998 Science &

Cell, 2006 Biochemistry

Nature, 2000 Science &

PNAS, 2001 Science &

Cell, 2002 Biochemistry

Cell, 1997 Biochemistry

Cell, 2007 Biochemistry

Annual Review of Neuroscience,

2001

& Molecular Biology; Cell Biology

227

Technology - Other Topics

& Molecular Biology; Cell Biology

Technology - Other Topics

& Molecular Biology; Cell Biology

Technology - Other Topics

& Molecular Biology; Cell Biology

Technology - Other Topics

Technology - Other Topics

& Molecular Biology; Cell Biology

& Molecular Biology; Cell Biology

& Molecular Biology; Cell Biology

Neurosciences & Neurology

pairing, often flanked

by…

histone code

dATP-dependent formation…

ERK and JNK-P38

heterochronic geneG…

actin cytoskeleton

and innate immunity

covalent histone modifications

brain function

function

Bidirectional, allosteric signaling…

of BAD couples survival…

modifications and their function

of prefrontal cortex

map…

2128 0.1757 5.67 B.P. Lewis Conserved seed

2506 0.1619 6.05 T. Jenuwein Translating the

2123 0.1759 5.07 P. Li Cytochrome c and

2802 0.1534 5.24 Z.G. XIA Opposing effects of

2865 0.1517 5.24 A. Hall Rho GTPases and the

3269 0.1411 5.45 S. Akira Pathogen recognition

3585 0.1348 5.87 B.D. Strahl The language of

3359 0.1390 5.25 M.E. Raichle A default mode of

3326 0.1398 5.16 E.K. Miller An integrative theory

4096 0.1262 6.20 S.R. Datta Akt phosphorylation

3572 0.1351 5.44 R.O. Hynes Integrins:

4825 0.1166 6.83 T. Kouzarides Chromatin

3120 0.1447 5.75 R.C. LEE The C. elegans


*rG G***(10−<sup>5</sup>**

226 Scientometrics

**)** *rk k***(***k***'**

(3401)

(4674)

(2426)

(2982)

(4011)

(1948)

(1478)

**Table 3.** Top 20 extremely prestigious papers.

**)** *rG G***(10−<sup>5</sup>**

93 0.8767 1565 2691

97 0.8598 1045 3245

116 0.7923 2736 2052

120 0.7838 4059 1705

121 0.7796 1406 2840

128 0.7584 3165 1914

129 0.7582 7409 1274

*rk k***(***k***'**

125 8890

297 5817

304 5747 (9681)

327 5533 (8874)

(17,192)

(10,877)

**)** *rk* **/***rG* **First author Title Journal, Year Subject**

H.264/AVC video coding standard

Optimization by a colony of…

visual formalism

the 21st-century

semantic analysis

herpetology and ichthyology…

temperature all-semiconducting…

**)** *rG***/***rk* **First author Title Journal, Year Subject**

The Next Generation

integrative analysis of large gene list…

density functionals for main…

Recognition and…

for…

IEEE Transactions on Circuits and Systems for Video Technology, 2003

Transactions on Systems Man and Cybernetics Part B-Cybernetics, 1996

Science of Computer Programming, 1987

Scientific American, 1991

Journal of the American Society for Information Science, 1990

Physical Review Letters, 2008

> Nature Protocols, 2008

> Theoretical Chemistry Accounts, 2008

Copeia, 1985 Zoology

IEEE

Engineering

Automation & Control Systems; Computer Science

Computer Science

Computer Science; Information Science & Library Science

Science & Technology - Other

Topics

Physics

Cell, 2011 Biochemistry

Cell, 2009 Biochemistry

& Molecular Biology; Cell Biology

Biochemistry & Molecular Biology

Chemistry

& Molecular Biology; Cell Biology

16.83 T. Wiegand Overview of the

10.77 M. Dorigo Ant system:

23.59 D. HAREL Statecharts - a

33.83 M. WEISER The Computer for

11.62 S. Deerwester; Indexing by latent

24.73 A.E. Leviton Standards in

57.43 X.Y. Wang Room-

627 0.3250 5.02 D. Hanahan Hallmarks of Cancer:

1580 0.2042 5.32 D.W. Huang Systematic and

1608 0.2023 5.29 Y. Zhao The M06 suite of

1810 0.1897 5.54 D.P. Bartel MicroRNAs: Target


linear relation as a benchmark and selected extremely prestigious and extremely popular papers. We found that the subject of extremely prestigious papers is almost information science. Furthermore, we found that extremely popular papers are divided into popular papers

Progress of Studies of Citations and PageRank http://dx.doi.org/10.5772/intechopen.77389 229

We conclude this chapter by describing two remaining issues. One concerns the stochastic model. Though we introduce GB2 as the best-fit function to the whole range of citation distribution, there is no stochastic model that explains GB2. The other concerns the weight of links in the citation network. Almost all studies have investigated citation networks as unweighted networks. However, it is possible to define weight of links, for example, similar-

This work is supported by Nihon University College of Science and Technology Grants-in Aid 2012 and 2016. The authors thank the Yukawa Institute of Theoretical Physics at Kyoto University. Discussions during the YITP workshop YITP-W-17-14 on "Econophysics 2017"

[1] Hou J. Exploration into the evolution and historical roots of citation analysis by referenced publication year spectroscopy. Scientometrics. 2017;**110**:1437-1452. DOI: 10.1007/

[2] de Solla Price DJ. Networks of Scientific Papers. Science. 1965;**149**:510-515. DOI:

[3] Bollen J, Van de Sompel H, Hagberg A, Chute R. A principal component analysis of 39 scientific impact measures. PLoS One. 2009;**4**:e6022. DOI: 10.1371/journal.pone.0006022

[4] Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems. 1998;**30**:107-117. DOI: 10.1016/S0169-7552(98)00110-X

and rising papers.

ity between papers.

**Author details**

Wataru Souma1

**References**

s11192-016-2206-9

10.2307/1716232

**Acknowledgements**

were useful to complete this work.

\* and Mari Jibu2

2 Japan Science and Technology Agency, Japan

\*Address all correspondence to: souma.wataru@nihon-u.ac.jp 1 College of Science and Technology, Nihon University, Japan

**Table 4.** Top 20 extremely popular papers.

On the other hand, there are also outliers which have low prestige comparing to popularity. These articles are located below the solid gray line in **Figure 8** and are regarded as extremely popular papers. These articles are extracted by the order of citation rank with the constraint given by the ratio *r G*/*r k* . **Table 4** depicts the top 20 extremely popular papers selected by using the constraint *r G*/*r <sup>k</sup>* <sup>&</sup>gt; 5. These articles are divided into two groups. One group contains papers which are published in Nature, Science, and the Proceedings of the National Academy of Science of the United State of America (PNAS). Besides, publication year of these papers are approximately over 10 years ago. Furthermore, the growth rate of citations, *k*' /*k*, of those papers are low. The other group includes papers which are mainly published in Cell and are published relatively recently. What is more, the growth rate of citations, *k*' /*k*, of those papers are extremely high. Thus, we can regard these papers as rising papers.
