**2. Mining and analysis of bibliometric data**

The application of this method is oriented to map the state of the art of a scientific theme through the characterization of bibliometric parameters. The bibliometric parameters used are those with greater availability in scientific research platforms. For this characteristic, the presented method has universal application in the different fields of knowledge, concerning the researcher to adapt it adequately to meet their demands of study.

**49**

**Figure 1.**

*Prepared by the authors.*

*Bibliometric Method for Mapping the State-of-the-Art and Identifying Research Gaps…*

the scientific demand and originality of the study proposed.

Another relevant characteristic of the method is that it was primarily designed to be applied in the initial phase of the development of a new study. Through the mapping of the state of the art, the bibliometric analysis reveals to the researcher essential information to base his study. The application of bibliometric analysis in the initial phase of a study can ensure that relevant references of the literature were considered for the construction of the new research. In addition, the results of the bibliometric study sheds light on the gaps of the literature, what can substantiate

The expected results of conducting the bibliometric analysis proposed in this method can be achieved by performing the macro-steps presented in **Figure 1**. After defining the field studied, with which it is advisable that the researcher has some affinity, the first challenge in the bibliometric study is the choice of the

The choice of the scientific research platform is one of the actions that has a significant impact on the bibliometric analysis and, therefore, must be well planned in order to obtain assertive results and avoid reworking. In practical terms, research

*Stages of the method for mapping the state of the art and identifying gaps and trends of research. Source:* 

*DOI: http://dx.doi.org/10.5772/intechopen.85856*

scientific research platform to be used.

## *Bibliometric Method for Mapping the State-of-the-Art and Identifying Research Gaps… DOI: http://dx.doi.org/10.5772/intechopen.85856*

Another relevant characteristic of the method is that it was primarily designed to be applied in the initial phase of the development of a new study. Through the mapping of the state of the art, the bibliometric analysis reveals to the researcher essential information to base his study. The application of bibliometric analysis in the initial phase of a study can ensure that relevant references of the literature were considered for the construction of the new research. In addition, the results of the bibliometric study sheds light on the gaps of the literature, what can substantiate the scientific demand and originality of the study proposed.

The expected results of conducting the bibliometric analysis proposed in this method can be achieved by performing the macro-steps presented in **Figure 1**.

After defining the field studied, with which it is advisable that the researcher has some affinity, the first challenge in the bibliometric study is the choice of the scientific research platform to be used.

The choice of the scientific research platform is one of the actions that has a significant impact on the bibliometric analysis and, therefore, must be well planned in order to obtain assertive results and avoid reworking. In practical terms, research

### **Figure 1.**

*Stages of the method for mapping the state of the art and identifying gaps and trends of research. Source: Prepared by the authors.*

*Scientometrics Recent Advances*

In light of this, the importance of bibliometrics is worth noting, which is a method for measuring, monitoring, and studying scientific outputs [3, 4]. Bibliometrics enables the mapping and expansion of knowledge on a particular area of research, evidencing connections between the main publications, authors, institutions, themes, and other characteristics of the field under study [3, 5].

An important application of bibliometric methods is its use as a tool for research evaluation [2, 4]. Outstanding papers in bibliometric studies are considered as reliable and relevant sources of results and are often used to justify decisions on research policies, funds, job offers, and promotions and also to direct and support research projects on the basis of what is most relevant in the scientific literature [2, 6]. The evolution of knowledge is something that occurs all the time, and the novelty of the researchers' proposals is the basic premise for the development of scientific research with great scientific and applied contributions. Therefore, the analysis of the state of the art of the field studied is an indispensable step to choose a no table research problem, since it can reveal gaps that need to be filled in the literature and important studies to underpin the researchers' proposals [6].

In this sense, it is important to emphasize that research funding bodies have increasingly required evidence that the research they support has potential to

Bibliometric methods involve the use of several tools that can help researchers to identify a relevant and current research problem, thus making clear the potential

Information technology (IT) tools can be used to assist the process of searching for relevant scientific contents, collecting scientific data, and summarizing the results obtained. These tools may be extremely important in clarifying the direction of a particular field of study and what advances and developments this may still

Bibliometric indicators, if properly analyzed, can give more consistence to the research project, since they use statistics from different bibliographic databases that

Thus, the researcher who develops the research project based on bibliometric analysis has the possibility of presenting the objectives and methods of his work clearly and concisely by illustrating which scientific gaps in the field will be filled

Hence, the purpose of this chapter is to present a method of bibliometric analysis for mapping the state of the art and identifying gaps and trends of research in literature. Throughout the sections some important bibliometric tools and analysis

The proper use of the presented method allows the understanding of the gaps and research tendencies from the mapping of the state of the art of a field studied, being a path to be followed by researchers in the phase of elaboration of their research project in order to ensure that the studies present real scientific, applied,

The application of this method is oriented to map the state of the art of a scientific theme through the characterization of bibliometric parameters. The bibliometric parameters used are those with greater availability in scientific research platforms. For this characteristic, the presented method has universal application in the different fields of knowledge, concerning the researcher to adapt it adequately

impact society with innovations, advances, etc. [2, 4, 7, 8].

impact of the research in case of it being developed [4, 6].

differ in terms of scope, data volume, and coverage [4].

to justify the development of research projects will be set out.

**2. Mining and analysis of bibliometric data**

with the development of its study.

and social contributions.

to meet their demands of study.

**48**

present [4, 6].

### *Scientometrics Recent Advances*

platforms present different tools for mining scientific data. Thus, the researcher should check what data can be extracted from each one of the platforms that best reflect the objectives of the study. This analysis is essential for the bibliometric study to meet the researcher's expectations and produce high-quality bibliometric studies.

Examples of scientific research platforms with robust databases and reasonable availability of search filters include Scopus (www.scopus.com) and Web of Science (WoS) (www.webofknowledge.com). These platforms provide access to thousands of scientific articles published by publishers such as Elsevier (www.sciencedirect. com), Emerald (www.emeraldinsight.com), Springer (www.springerlink.com), Wiley (www.wiley.com), and Taylor & Francis (www.tandfonline.com), among others. EBSCO (www.ebsco.com), Crossref (www.crossref.org), and Google Scholar (scholar.google.com) are other multidisciplinary platforms also used by researchers. In addition to these, there are extensive numbers of platforms specific to the different fields of knowledge.

The combination of one or more platforms for mining scientific data can result in more consistent bibliometric analysis. On the other hand, it will be more difficult to integrate information from platforms with different structures, and although there are computational tools that support the integration of this data, they still require great improvements.

Besides the structural differences between the platforms, there are also differences in the classification of the information adopted by each of them. For example, if the same search criteria were applied to different platforms, the results returned may not be the same. The variation in the number of articles is explained by the different search parameters adopted and also by the particular coverage of each platform.

The difference in the results generated by the platforms is approached in the works conducted by [9–12].

In this way, it concerns the researcher to identify the scientific research platforms that offer the largest collection of articles in their field of study. Throughout this chapter, the scientific platforms Scopus and WoS will be used as reference to operationalize the proposed method and integrate bibliometric data from these two platforms. The main reasons for using Scopus and WoS are the multidisciplinarity of these databases, once they cover a relevant and extensive collection of scientific publications and have available various tools for scientific data mining.

Scopus and WoS enable the extraction of essential data for conducting bibliometric analysis. From these data, it is possible to compare the performance of the literature according to each platform and corroborate, complement, or refute the results.

In addition, the integration of the scientific data provided by these platforms makes it possible to obtain more robust results for the bibliometric analysis. **Figure 2** presents the bibliometric data provided by the Scopus and WoS platforms.

Once the scientific platforms to be used have been defined, it is necessary to establish the search criteria of the articles. In order to obtain high-quality data mining, the search strategy should reliably reflect the research topic, the study objectives, and the limits of the research field. The main search criteria used in the bibliometric analysis are specific terms to the field of study, publication period, document type and language, and area of knowledge.

The specific terms of the field under study should express the subject, its synonyms, and different spellings. For example, the theme "Lean Six Sigma" is often represented by its synonym "Lean Sigma" or "Lean 6 Sigma". For more assertive results, it is recommended to use the Boolean expressions "AND" or "OR" to combine different expressions and increase the theme's specification, thus improving

**51**

*Bibliometric Method for Mapping the State-of-the-Art and Identifying Research Gaps…*

the accuracy of the results returned. Specific terms of the field under study should

The filter of publication period should be defined according to the period in which the researcher wishes to perform the bibliometric analysis. For a complete map of the state of the art, it is suggested that all studies regardless of their year of publication be analyzed. However, when the fields of study have been investigated for many decades and the number of documents published is too high, it is advisable to limit the period analyzed because the subjects approached in the older papers were probably widely explored in the following ones. It should be highlighted that the analysis of gaps and trends for future research (discussed later in details) in general needs to be developed based on recent publications [13].

Among the different types of documents available on scientific platforms, articles and reviews published in journals are the most reliable source to review the literature, since they are peer-reviewed in their full version [14]. It is recommended not to include conference papers, notes, letters, books, book chapters, editorials, doctoral theses, master's dissertations, and nonscientific publications, except in cases where these types of documents are indispensable and relevant to the field studied. However, when it comes to the medical field of research, it is also necessary to take note of letters to the editor and case reports. The letters to the editor are documents that report important discussions for the development of medical research, being not so common in other scientific fields. Further, the case reports present the most recent findings or the limits to which modern medicine and technology are progress-

It is also suggested that only studies published in English be selected, since this is

The definition of the area of knowledge allows to limit the research to specific fields. However, if the researcher wishes to know the state of the art as a whole, it is recommended that the study be carried out considering all the areas of knowledge. If the amount of data found is extensive, the researcher may limit the investigation to only one area. **Table 1** presents an example of application of the search criteria to

The results that meet the previously established criteria are displayed, and the researcher must analyze the quality of the results. If the quality of the results is satisfactory, the researcher should save it in the user area of the search platform. The user area stores all the results obtained with the different applications of the search criteria. The articles saved in this user area will be the database used for conducting the bibliometric analysis. It is important to emphasize that before starting the data analysis, it is necessary to certify the search results to eliminate articles that do not

be primarily sought in the titles, keywords, and summaries of the articles.

*Bibliometric data provided by scientific platforms Scopus and WoS. Source: Prepared by the authors.*

ing and can be a channel for dissemination of best practices and solutions.

belong to the field of study, but are included in the results.

the universal language of the science.

the study field on Lean Six Sigma.

*DOI: http://dx.doi.org/10.5772/intechopen.85856*

**Figure 2.**

*Bibliometric Method for Mapping the State-of-the-Art and Identifying Research Gaps… DOI: http://dx.doi.org/10.5772/intechopen.85856*

**Figure 2.**

*Scientometrics Recent Advances*

to the different fields of knowledge.

require great improvements.

works conducted by [9–12].

studies.

platform.

results.

platforms present different tools for mining scientific data. Thus, the researcher should check what data can be extracted from each one of the platforms that best reflect the objectives of the study. This analysis is essential for the bibliometric study to meet the researcher's expectations and produce high-quality bibliometric

Examples of scientific research platforms with robust databases and reasonable availability of search filters include Scopus (www.scopus.com) and Web of Science (WoS) (www.webofknowledge.com). These platforms provide access to thousands of scientific articles published by publishers such as Elsevier (www.sciencedirect. com), Emerald (www.emeraldinsight.com), Springer (www.springerlink.com), Wiley (www.wiley.com), and Taylor & Francis (www.tandfonline.com), among others. EBSCO (www.ebsco.com), Crossref (www.crossref.org), and Google Scholar (scholar.google.com) are other multidisciplinary platforms also used by researchers. In addition to these, there are extensive numbers of platforms specific

The combination of one or more platforms for mining scientific data can result in more consistent bibliometric analysis. On the other hand, it will be more difficult to integrate information from platforms with different structures, and although there are computational tools that support the integration of this data, they still

Besides the structural differences between the platforms, there are also differences in the classification of the information adopted by each of them. For example, if the same search criteria were applied to different platforms, the results returned may not be the same. The variation in the number of articles is explained by the different search parameters adopted and also by the particular coverage of each

The difference in the results generated by the platforms is approached in the

In this way, it concerns the researcher to identify the scientific research platforms that offer the largest collection of articles in their field of study. Throughout this chapter, the scientific platforms Scopus and WoS will be used as reference to operationalize the proposed method and integrate bibliometric data from these two platforms. The main reasons for using Scopus and WoS are the multidisciplinarity of these databases, once they cover a relevant and extensive collection of scientific

Scopus and WoS enable the extraction of essential data for conducting bibliometric analysis. From these data, it is possible to compare the performance of the literature according to each platform and corroborate, complement, or refute the

In addition, the integration of the scientific data provided by these platforms makes it possible to obtain more robust results for the bibliometric analysis. **Figure 2**

Once the scientific platforms to be used have been defined, it is necessary to establish the search criteria of the articles. In order to obtain high-quality data mining, the search strategy should reliably reflect the research topic, the study objectives, and the limits of the research field. The main search criteria used in the bibliometric analysis are specific terms to the field of study, publication period,

The specific terms of the field under study should express the subject, its synonyms, and different spellings. For example, the theme "Lean Six Sigma" is often represented by its synonym "Lean Sigma" or "Lean 6 Sigma". For more assertive results, it is recommended to use the Boolean expressions "AND" or "OR" to combine different expressions and increase the theme's specification, thus improving

publications and have available various tools for scientific data mining.

presents the bibliometric data provided by the Scopus and WoS platforms.

document type and language, and area of knowledge.

**50**

*Bibliometric data provided by scientific platforms Scopus and WoS. Source: Prepared by the authors.*

the accuracy of the results returned. Specific terms of the field under study should be primarily sought in the titles, keywords, and summaries of the articles.

The filter of publication period should be defined according to the period in which the researcher wishes to perform the bibliometric analysis. For a complete map of the state of the art, it is suggested that all studies regardless of their year of publication be analyzed. However, when the fields of study have been investigated for many decades and the number of documents published is too high, it is advisable to limit the period analyzed because the subjects approached in the older papers were probably widely explored in the following ones. It should be highlighted that the analysis of gaps and trends for future research (discussed later in details) in general needs to be developed based on recent publications [13].

Among the different types of documents available on scientific platforms, articles and reviews published in journals are the most reliable source to review the literature, since they are peer-reviewed in their full version [14]. It is recommended not to include conference papers, notes, letters, books, book chapters, editorials, doctoral theses, master's dissertations, and nonscientific publications, except in cases where these types of documents are indispensable and relevant to the field studied. However, when it comes to the medical field of research, it is also necessary to take note of letters to the editor and case reports. The letters to the editor are documents that report important discussions for the development of medical research, being not so common in other scientific fields. Further, the case reports present the most recent findings or the limits to which modern medicine and technology are progressing and can be a channel for dissemination of best practices and solutions.

It is also suggested that only studies published in English be selected, since this is the universal language of the science.

The definition of the area of knowledge allows to limit the research to specific fields. However, if the researcher wishes to know the state of the art as a whole, it is recommended that the study be carried out considering all the areas of knowledge. If the amount of data found is extensive, the researcher may limit the investigation to only one area. **Table 1** presents an example of application of the search criteria to the study field on Lean Six Sigma.

The results that meet the previously established criteria are displayed, and the researcher must analyze the quality of the results. If the quality of the results is satisfactory, the researcher should save it in the user area of the search platform. The user area stores all the results obtained with the different applications of the search criteria. The articles saved in this user area will be the database used for conducting the bibliometric analysis. It is important to emphasize that before starting the data analysis, it is necessary to certify the search results to eliminate articles that do not belong to the field of study, but are included in the results.


### **Table 1.**

*Example of application of the search criteria for the study field "Lean Six Sigma".*

One of the errors in the mining of bibliometric data is the use of a word that has various meanings. For example, if you search for the term "Lean" many articles that belong to "Lean Manufacturing" search field will be selected. However, many articles that contain the word "Lean" to refer to a characteristic of the human body will also be returned.

For this reason, it is essential that the author be attentive to this analysis on the search results. It is recommended that the title of the article be analyzed first to verify its suitability for the purposes of the study. If there is any doubt after the examination of the title, it is necessary to read the article abstract. If the doubt persists, it is fundamental to analyze the full text to decide if the bibliometric data of this article will be exported to the software of bibliometric analysis.

Scientific platforms provide various bibliometric data, and among the most relevant to perform the bibliometric analysis are title of the article, authors, journal, year of publication, number of citations, institutions, countries, keywords, and bibliographic references. The choice of exporting all these data should be specified in the scientific platform itself, as shown in **Figures 3** and **4**.

Scientific platforms offer different options of file formats to be exported. The most suitable file formats for export are ".csv" and ".txt". These formats allow the bibliometric data to be analyzed in spreadsheets and/or bibliometric softwares.


**53**

*Bibliometric Method for Mapping the State-of-the-Art and Identifying Research Gaps…*

In order to make bibliometric analysis more accessible to the scientific community, the presented method recommends the use of open source software such as



Calc is a spreadsheet, and its use covers two main purposes: to integrate bibliometric data from different scientific platforms and analyze them. The bibliometric data imported in the Calc spreadsheet and the respective bibliometric parameters

Importing data into the spreadsheet will only occur when the files are exported in the correct format. For the Scopus platform, the file should be exported in the ".csv" format. On the WoS platform, the data should be exported in the ".txt" format ("tab-separated" option). With the data imported into separated worksheets, the researcher must organize them into a single worksheet and classify according to the title of the article. This activity will allow the identification of duplicate files,

The bibliometric parameter "evolution of publications" is based on the publication year of each article. To obtain this parameter, the spreadsheet must be operated to quantify the number of articles published per year. This parameter reveals the dynamics of the publications, whether the topic has been widely explored by researchers or if there is a reduction in the interest of the scientific community. The bibliometric data of research areas are used to quantify which areas are the most researched ones. The information obtained by this bibliometric parameter

Calc (LibreOffice), Sci2 Tool (CNS), and VOSviewer (Leiden University).

generated are presented in **Table 2**.

*Source: Prepared by the authors.*

**Table 2.**

making possible their elimination.

*DOI: http://dx.doi.org/10.5772/intechopen.85856*

*Selecting the data to be exported on the WoS platform [16].*

**Bibliometric data Bibliometric parameters** - Year of publication - Evolution of publications - Research areas - Most searched areas - Keywords - Most used keywords - Article title - Most cited articles - Authors - Most cited authors



*Imported bibliometric data and bibliometric parameters generated.*

**Figure 4.**
