**2.2. Investigating the intellectual structure in scientometrics**

Scientometrics depicts the intellectual landscapes of a science with a variety of bibliographic units such as authors, keywords, texts, and citations and networks of those entities. The present chapter systematically mapped historical footprint and emerging technologies from published research in scientometrics. In particular, we investigated citation paths at a disciplinary level, co-occurrence of WoS categories and keywords, and networks of co-cited references. Network clustering and topic modeling were also used to find homogeneous sets of literature and coherent streams of research. In so doing, we captured emerging trends, recent developments, and current challenges in the domain. Especially, we employed a top-down approach in analyzing data going from macro-level to micro-level. It had us add richer interpretations as we gradually moved on to lower-level units of analysis such as journal-level citation paths, subject categories, keywords, titles and abstracts to cited references. To this end, this chapter is mainly guided by

two suites of software, namely CiteSpace [4–6] and VOSviewer [7]. The input is a collection of bibliographic records relevant to a topic of interest. Given the records, the toolkits detect and render thematic patterns and emerging trends in science as networked in a variety of bibliographic units. As argued by preceding papers [8, 9], this chapter's approaches have several methodological merits over a conventional domain analysis. First, a much more inclusive range of topically relevant literature can be examined. Second, an inquiring individual does not need prior expertise to analyze a domain of interest. Finally, this kind of survey can be conducted as frequently as in need given the fast growth of a science. The underlying techniques and findings of the present chapter could be more clearly delivered as we introduce followings:

**Table 2.** Querying terms (the wildcard character "\*" captures any relevant variations of a term).

**Term Duration Total Articles Procs. Reviews** bibliometric\* 1990–2017 6352 5449 313 590 scientometric\* 1990–2017 1779 1577 93 109 informetric\* 1990–2017 382 334 28 20 webometric\* 1997–2017 288 254 25 9 altmetric\* 2012–2017 261 237 7 17 cybermetric\* 1999–2015 28 27 1 entitymetric\* 2013–2015 3 3 — —

Scientometrics of Scientometrics: Mapping Historical Footprint and Emerging Technologies…

http://dx.doi.org/10.5772/intechopen.77951

13

• Network reduction: In network analysis, investigating the entire nodes and edges between them is computationally challenging. It may not intuitively communicate the topological structure to the audience as well for it is visually overwhelming with many links. To handle this, we select up to 100 frequently occurring entities such as keywords and cited references

• Clustering: Clustering is unsupervised learning which uncover latent groups of entities sharing homogeneous characteristics. We employ a network clustering technique called smart local moving [10] to capture thematically similar clusters on a document co-citation network. • Burst detection: Proposed by [11], burst detection models the burstiness of features which rise sharply in frequency. An entity has bursting activities when it intensively appears during a specific span of time. We can overcome the limitation coming from considering

• Cluster labeling: CiteSpace labels clusters with extracted terms from titles and abstracts of citing articles. There are three algorithms to serve cluster labeling: (1) latent semantic analysis (LSA), (2) log-likelihood ratio (LLR), and (3) mutual information (MI). LSA captures unknown semantic relationships over all the documents while LLR and MI reflect a

• Topic modeling: Topic modeling is unsupervised machine learning which aims to discover latent semantic structure occurring in a text body. We employ dynamic topic modeling

within a one-year time slice.

unique aspect of a cluster [5].

cumulative, snapshot metrics as impact measures.


**Table 1.** Data statistics.

**Figure 3.** The distribution of records over time.

Scientometrics of Scientometrics: Mapping Historical Footprint and Emerging Technologies… http://dx.doi.org/10.5772/intechopen.77951 13


**Table 2.** Querying terms (the wildcard character "\*" captures any relevant variations of a term).

two suites of software, namely CiteSpace [4–6] and VOSviewer [7]. The input is a collection of bibliographic records relevant to a topic of interest. Given the records, the toolkits detect and render thematic patterns and emerging trends in science as networked in a variety of bibliographic units. As argued by preceding papers [8, 9], this chapter's approaches have several methodological merits over a conventional domain analysis. First, a much more inclusive range of topically relevant literature can be examined. Second, an inquiring individual does not need prior expertise to analyze a domain of interest. Finally, this kind of survey can be conducted as frequently as in need given the fast growth of a science. The underlying techniques and findings of the present chapter could be more clearly delivered as we introduce followings:


**Figure 3.** The distribution of records over time.

**Table 1.** Data statistics.

cases text fields were omitted. Thus, we excluded data before 1990. The brief statistics of the

**Figure 3** renders the record distribution over time in our data collection. As illustrated, there

**Table 2** describes the contributing terms to the data retrieval and corresponding number of records to each term. As shown, the literature has used "bibliometric\*" the most frequently.

Scientometrics depicts the intellectual landscapes of a science with a variety of bibliographic units such as authors, keywords, texts, and citations and networks of those entities. The present chapter systematically mapped historical footprint and emerging technologies from published research in scientometrics. In particular, we investigated citation paths at a disciplinary level, co-occurrence of WoS categories and keywords, and networks of co-cited references. Network clustering and topic modeling were also used to find homogeneous sets of literature and coherent streams of research. In so doing, we captured emerging trends, recent developments, and current challenges in the domain. Especially, we employed a top-down approach in analyzing data going from macro-level to micro-level. It had us add richer interpretations as we gradually moved on to lower-level units of analysis such as journal-level citation paths, subject categories, keywords, titles and abstracts to cited references. To this end, this chapter is mainly guided by

**Duration Total Articles Procs. Reviews Authors Keywords Refs.** 1990–2017 8098 7013 413 672 23,791 98,493 328,096

has been exponentially increasing interest in scientometrics from the community.

**2.2. Investigating the intellectual structure in scientometrics**

retrieved data set is described in **Table 1**.

12 Scientometrics

(DTM) which is a generative technique extended from Latent Dirichlet Allocation (LDA). DTM captures the evolution of latent topics in a collection of documents whereas it was oblivious to the preceding model [12].
