**3. Methods**

In this study, to consider about automatic extraction of major paper in the each area to solve the issue of previous section. To calculate the severity by the traditional citation analysis is difficult, because shall be interpreted differently even if the same number of citations. For example, the cases that have been cited in many papers during the same period, and, the cases that cited in the long-term.

Therefore, in this study we consider a digraph assuming a paper is a node and a quotation is an edge for each "case" mentioned above. Then, We try to calculate the importance of each node by examining the variance of release years of source nodes whose edges enter into a node after allocating release years to each node. And we aim to establish a hiangle map with

Then, Bibliometrics is a method developed by Garfield and Price. Bibliometrics is a method to support the resarch activity. It does analysis of academic papers or patents, Then, It can be understand that: "What is hot research topics?", "Which a large number of cited papers?", "Which are important papers?", "Which is related to that area?", "Who is

The first is the "Direct Citation". It is regarded there is a link between paper A and paper B, If paper A is cited in the Paper C. In this case, there are 2 nodes and 1 link in the network. Papers are regarded as having any link between the papers themselves has been cited, when

The second is the " Co-Citation". It is the method proposed by Small [5]. Then it is regarded there is a link between paper A and paper B, If paper A and paper B is cited by paper C. In this case, there are 2 nodes and 1 link in the network. Be considered that there are a links

The third is the " Bibliographic Coupling". It is the method proposed by Kessler [6]. Then it i regarded there is a link between paper D and paper E, If paper C is cited in the Paper D and paper E. In this case, there are 2 nodes and 1 link in the network. Be considered that there

B Direct Citation

Co-Citation Bibliographic Coupling

It became possible to analyse the academic landscape by combining these studies. Specifically, Each phase of the "Build of citation networks", "Getting the largest connected components", "Clustering" and "Visualization" will be processed automatically. But each area cannot automatically interpret. It is currently being analysed by the experts (Fig. 2).

In this study, to consider about automatic extraction of major paper in the each area to solve the issue of previous section. To calculate the severity by the traditional citation analysis is difficult, because shall be interpreted differently even if the same number of citations. For example, the cases that have been cited in many papers during the same period, and, the

Therefore, in this study we consider a digraph assuming a paper is a node and a quotation is an edge for each "case" mentioned above. Then, We try to calculate the importance of each node by examining the variance of release years of source nodes whose edges enter into a node after allocating release years to each node. And we aim to establish a hiangle map with

between a pairs of those papers that all papers are listed in the bibliographies.

important researcher?", "Which is the important research institute?"

There are three methods of Bibliometrics analysis (Fig. 1.).

using a"Direct Citation".

are links for all pairs of papers that cite.

A

E D

C

Fig. 1. The kinds of citation analysis

cases that cited in the long-term.

**3. Methods** 

Fig. 2. Procedure of Academic Landscape based on network analysis


#### **3.1 Publication databases and search query**

This study used SCOPUS for publication databases. As a result of narrowing down the number of papers by "clustering" as query, the number of papers for this study became 87,399.

#### **3.2 Publication year analysis of variance and weight of each cited papers**

Follow the steps 1) to 3) below to weight paper:

**Step 1.** To extract the maximum value in the histogram, the year with the maximum number of quotations is extracted by using the function below and save it in MaxYear.

$$\text{MaxYear} = \max\{y(\mathbf{x}) \mid y(\mathbf{x}) \text{= Number of times the references in year Y}\}\tag{1}$$

**Step 2.** To identify the quotation period, find out the maximum year by examining the release years chronologically; the year with the number of quotations exceeding 10% of the maximum year's quotations for the first time should be the start year and it is saved in StartYear. Then, the year with the number of quotations getting below 10% should be the end year and it is saved in LastYear. The period of quotation is achieved by the formula below:

Academic Landscape Based on Network Analysis

**3.4 Visualization based on the severity** 

this tool is called SciHi (**Sci**ence **Hi**ghangle).

**4. Evaluation experiment** 

Considering Analysis of Variation in the Years of Lucubration Publishing 375

Fig. 4. shows a visualization of the quotation network based on the importance obtained in the previous section. Each nodes has the title of paper displayed. As described in the previous section, a node with a larger importance is displayed as a larger node. In addition,

Fig. 3. Outside the normal distribution case that the shape of the histogram

Fig. 4. Example of Academic Landscape based on weight of each cited papers

method can extract the major papers that were manually extracted by experts.

In the survey report on study trend published in 2004, Tatsubori et al. used the publication databases of IBM to survey the research trend of software architecture after 1999. They manually extracted 51 major papers [8]. In this study, we examine how well the automated

$$\text{Period} := \begin{pmatrix} \text{LastYear} + 1 \end{pmatrix} - \text{StartYear} \tag{2}$$

In addition, if there are two or more peaks in the histogram as shown in Fig. 3., the steps above should be repeated and the periods are saved in Period0,1,2,,,,n.

**Step 3.** To calculate the variance (standard deviation) of a histogram, it can be defined how long paper has been quoted by examining the variance (standard deviation) of the release years of the papers that referred to the target paper. The following shows a common way to obtain standard deviation; the obtained standard deviation is saved in Variance.

$$Variance = \frac{\sum \left(x - \overline{x}\right)^2}{n} \tag{3}$$

Again, if the histogram is not normally-distributed (it has two or more peaks) as shown in Fig. 3., the variances (standard deviations) of different periods (Period0,1,,,n) is calculated and the average of them is to be saved in Variance. Then, the value in Variance is used for weighing of the target paper.

#### **3.3 Calculation of the severity of each cited papers**

The PageRank algorithm [7] is a technique to calculate quantitatively which page is most important if there are cross reference relations among pages such as the hyperlink structure. In this study, We calculate the importance of papers by utilizing this algorithm. The calculation follows the steps below:

(1)Each Paper have a unique scores. and each cited have a unique score,too.


Where, the statement below is assumed to be true:

*Variance1+…+ Variancen=P* 

$$\mathcal{O}\_{\mathsf{L}} = \begin{array}{c} \cdots \xrightarrow{\cdot} = \mathcal{O}\_{m} = \frac{p}{m} \left(= \frac{\sum\_{l=1}^{m} \text{Variance}\_{l}}{m}\right) \end{array}$$

In other words, assuming that the total score of "out going" quotation for paper should be equal to the total score of "in coming", the total score should be considered to be the base of the paper. Then, paper should be considered to be more important as the score becomes higher. Thus, We intend to identify major papers in each field by applying the value in Variance to calculation of the "in coming" score for paper. In the conventional algorithm, if there are more than one "in coming" quotations, the score of each quotation was thought to be equal. On the contrary, in this research a quotation with a larger value in Variance should be thought to have a larger score. In this algorithm, the importance of paper can be calculated with the factor of quoted years reflected.

Fig. 3. Outside the normal distribution case that the shape of the histogram
