**3. Methodology and data**

The empirical data used in this chapter consists of a set of 152,140 collaborations by scientists affiliated to different universities and published in journals indexed by the Science Citation Index Expanded (SCI Expanded) provided by the Thomson Reuters Web of Science (WoS). Socio-economic and humanities disciplines are excluded from our analysis. Our period of analysis is 2001–2010. This dataset was built following a similar procedure to Acosta et al. [10, 30]. Since our focus is at the university level, we had to harmonize the name variations of universities, mainly stemming from the use of the native versus the English name or the use of different acronyms. Then, papers were assigned to universities following the full counting process (crediting one publication to each co-author institution). Next, data on academic collaboration was placed into a symmetrical matrix containing all co-publications between university *i* and university *j* and, therefore, excluded intra-university collaboration. Publications where classified into 12 scientific disciplines following the Centre for Science and Technology Studies (CWTS) classification, using again the full counting method for those publications included in journals related to more than one discipline.

• Social proximity (*Socialprox*) is represented by a dummy variable which takes value 1 if universities "i" and "j" have collaborated for the five-years previous period 2001–2005. However, this indicator does not allow us to provide evidence on trends in social distance

Patterns of Academic Scientific Collaboration at a Distance: Evidence from Southern European…

http://dx.doi.org/10.5772/intechopen.77370

33

• Organizational proximity is captured by two variables. *Educprox* is the correlation coefficient between the nine education fields, as identified in EUMIDA, corresponding to university *i* and university *j*. *Staffdist* is the absolute difference in total staff of universities *i* and *j*.

• Economic distance is measured by three variables. *GDPdist* is the absolute difference in the average GDP in 2004–2008 between regions, where universities *i* and *j* are located. *R&Ddist* is calculated similarly but using the absolute difference in higher education R&D expenditures as % of the GDP. *Convergence* is a dummy variable that equals one if the two universi-

Note that the description of the variables refers to data for all 12-disciplines. For separated descriptive by disciplines, collaborations, and previous collaborations refer to the respective counts for that specific discipline. At the discipline level, Cogndistij represents the dissimilarity in specialization in a certain discipline. Since it is not possible to calculate it as a correlation coefficient or Paci and Usai index [31], it was calculated following a different procedure for models by disciplines: first, we calculated for each university the share of publications in each discipline over its total number of publications; second, we obtained the absolute difference

It is worth noting that organizational proximity measures attempt to capture a complex phenomenon difficult to measure. Then, we choose the differences in educational profiles and size as factors capturing organizational characteristics that may shape their culture or orientation. In addition, we did not have access to data on R&D funding information at the level of institutions, so we have included the amount of R&D expenditures in the region in which the

In order to identify trends in scientific collaboration, we calculate the descriptives of distances for those pairs collaborating during 2001–2005 and, then, for those pairs collaborating during 2006–2010. **Table 1** shows some descriptives on collaborations in our sample: the number

Pairs (a) 15,225 15,225 Collaborating pairs (b) 3669 4775 Total Collaborations (c) 60,522 91,618 b/a 24.10% 31.36% Collaboration intensity (c/b). 16.50 19.19

**01–05 06–10**

These variables refer to year 2008, which is the reference year for EUMIDA dataset.

ties are located in convergence regions; zero otherwise.

in this indicator for each pair of universities.

university is located.

**Source:** ISI Web of Science. Own elaboration.

**Table 1.** Number of collaborations and collaboration intensity 2001–2010.

since we did not have data on previous collaborations for the period 2001–2005.

In a further step, we matched this dataset with EUMIDA dataset (Data Collection 1) in order to get information about organizational characteristics of the universities. EUMIDA data is the result of an initiative of the European commission to provide a complete census of European universities and provides information at the university level including organizational details such as education offered and staff employed2 . Our final sample includes only those universities that were present in both datasets, that is, 175. Consequently, there are potentially (175\*174) ÷ 2 = 15,225 collaboration links (observations). Additional information about regional Gross Domestic Product (GDP) and R&D expenditures was extracted from Eurostat.

In order to estimate the influence of different proximity dimensions on university SC, we put forward several variables:


<sup>2</sup> A description of data and the collection procedure is provided in EUMIDA 2010. Feasibility Study for Creating a European University Data Collection [Contract No. RTD/C/C4/2009/0233402]. Data collection 1 is available at http:// ec.europa.eu/research/era/areas/universities/universities\_en.htm. (Accessed at 18/10/2012). Data collection 2, which contains more detailed data, was not available to us by the time of this research.

• Social proximity (*Socialprox*) is represented by a dummy variable which takes value 1 if universities "i" and "j" have collaborated for the five-years previous period 2001–2005. However, this indicator does not allow us to provide evidence on trends in social distance since we did not have data on previous collaborations for the period 2001–2005.

**3. Methodology and data**

32 Scientometrics

such as education offered and staff employed2

cal specialization) and one (maximum distance).

tains more detailed data, was not available to us by the time of this research.

forward several variables:

*i* and *j*.

0 otherwise.

2

The empirical data used in this chapter consists of a set of 152,140 collaborations by scientists affiliated to different universities and published in journals indexed by the Science Citation Index Expanded (SCI Expanded) provided by the Thomson Reuters Web of Science (WoS). Socio-economic and humanities disciplines are excluded from our analysis. Our period of analysis is 2001–2010. This dataset was built following a similar procedure to Acosta et al. [10, 30]. Since our focus is at the university level, we had to harmonize the name variations of universities, mainly stemming from the use of the native versus the English name or the use of different acronyms. Then, papers were assigned to universities following the full counting process (crediting one publication to each co-author institution). Next, data on academic collaboration was placed into a symmetrical matrix containing all co-publications between university *i* and university *j* and, therefore, excluded intra-university collaboration. Publications where classified into 12 scientific disciplines following the Centre for Science and Technology Studies (CWTS) classification, using again the full counting method for those publications included in journals related to more than one discipline.

In a further step, we matched this dataset with EUMIDA dataset (Data Collection 1) in order to get information about organizational characteristics of the universities. EUMIDA data is the result of an initiative of the European commission to provide a complete census of European universities and provides information at the university level including organizational details

ties that were present in both datasets, that is, 175. Consequently, there are potentially (175\*174) ÷ 2 = 15,225 collaboration links (observations). Additional information about regional Gross

In order to estimate the influence of different proximity dimensions on university SC, we put

• Geographical distance (*Geodist*) is measured as the Euclidean distance between universities

• Cognitive distance (*Cogndist*) is captured as the correlation index calculated as Paci and Usai [31] for the 12 discipline composition of scientific papers in university *i* and university *j* for the period 2001–2005. This coefficient ranges between zero (minimum distance, identi-

• Institutional proximity is measured by two binary variables. *Region* is a dummy variable, which takes value 1 when universities *i* and *j* are in the same region, 0 otherwise. *Country* is a dummy variable, which takes value 1 when universities i and j are in the same country,

A description of data and the collection procedure is provided in EUMIDA 2010. Feasibility Study for Creating a European University Data Collection [Contract No. RTD/C/C4/2009/0233402]. Data collection 1 is available at http:// ec.europa.eu/research/era/areas/universities/universities\_en.htm. (Accessed at 18/10/2012). Data collection 2, which con-

Domestic Product (GDP) and R&D expenditures was extracted from Eurostat.

. Our final sample includes only those universi-


Note that the description of the variables refers to data for all 12-disciplines. For separated descriptive by disciplines, collaborations, and previous collaborations refer to the respective counts for that specific discipline. At the discipline level, Cogndistij represents the dissimilarity in specialization in a certain discipline. Since it is not possible to calculate it as a correlation coefficient or Paci and Usai index [31], it was calculated following a different procedure for models by disciplines: first, we calculated for each university the share of publications in each discipline over its total number of publications; second, we obtained the absolute difference in this indicator for each pair of universities.

It is worth noting that organizational proximity measures attempt to capture a complex phenomenon difficult to measure. Then, we choose the differences in educational profiles and size as factors capturing organizational characteristics that may shape their culture or orientation. In addition, we did not have access to data on R&D funding information at the level of institutions, so we have included the amount of R&D expenditures in the region in which the university is located.

In order to identify trends in scientific collaboration, we calculate the descriptives of distances for those pairs collaborating during 2001–2005 and, then, for those pairs collaborating during 2006–2010. **Table 1** shows some descriptives on collaborations in our sample: the number


**Table 1.** Number of collaborations and collaboration intensity 2001–2010.

of collaborating pairs has increased from 3669 to 4775 and total collaboration has substantially increased by 51.38%. From all possible pairs of universities, 24.10% has collaboration in 2001–2005, while it increases to 31.36% in 2006–2010. The intensity of collaboration (number of average collaborations among pairs) has increased from 16.50 to 19.19.
