**2. Digitization and sharing of biological collection data**

According to Chapman (2009) the Earth's biodiversity is estimated to comprise approximately 11.3 million species, from which less than 2 million have been formally described by science. These figures reveal the limited knowledge we have which is a key issue for the preservation and sustainable use of biodiversity and ecosystem services. In order to fill that gap more field data is necessary to discover and map the biodiversity before it is gone. Nonetheless a lot can be done with the existing data, if it becomes more available and if other techniques are applied to analyze the data.

Traditionally biodiversity primary data are hosted in biological collections distributed around the world. They vary broadly in relation to size and organization, ranging from large, structured, well-documented and maintained museum collections to small sets of specimens kept by individual researchers with limited resources. Both data sources are important as they may cover different gaps – taxonomic and geographic - in our quest to know life on Earth.

The most traditional users of biological collections have been taxonomists and systematists that use them for identifying, naming and classifying species, for studying the diversity of species and the relationships among them through time (Baird, 2010). However, while these studies are essential for the development of other disciplines, such as ecology, biological collections are also essential data sources to help answer questions that interest and may involve many more individuals and knowledge areas including basic biology, human economics, and public health.

Typically they help address questions on natural resource inventories, on the occurrence and distribution of species over space and time; on the reasons for changes that may have occurred; the effect of environmental change – including climate change - on biodiversity (Scoble, 2010). This applies to native (wild) species, to economically important species, to infectious disease vectors, and to invasive species for which distribution prediction can be very helpful.

Pollination is considered as a key element of ecosystem services (Daily, 1997). Ollerton et al. (2011) estimated that the proportion of animal-pollinated species is near 78% in temperatezone communities and 94% in tropical communities and that the global number and proportion of animal pollinated angiosperms is 308 006, which means 87.5% of the estimated

The decline of pollinators has received attention since the 1990 decade (Buchmann e Nabhan, 1996; Kearnes et al., 1998). Recently, multiple drivers were suggested as the main causes to this decline (Schweiger et al., 2010; Potts et al., 2010) such as loss and fragmentation of habitat, aggressive agricultural practices, pathogens, invasive species and

In order to achieve a better understanding about pollinator species threats it is necessary new research approaches, especially considering the necessity to build useful public policies to protect them. Here, we discuss new approaches to research on pollinators, especially bees, based mostly on Information Technology tools, such as, Biodiversity Databases, DNA Barcode, Morphometric Analysis and Species Distribution Modeling. At the end, a study case is presented, considering some Brazilian bee species and the potential impact of climate

According to Chapman (2009) the Earth's biodiversity is estimated to comprise approximately 11.3 million species, from which less than 2 million have been formally described by science. These figures reveal the limited knowledge we have which is a key issue for the preservation and sustainable use of biodiversity and ecosystem services. In order to fill that gap more field data is necessary to discover and map the biodiversity before it is gone. Nonetheless a lot can be done with the existing data, if it becomes more available

Traditionally biodiversity primary data are hosted in biological collections distributed around the world. They vary broadly in relation to size and organization, ranging from large, structured, well-documented and maintained museum collections to small sets of specimens kept by individual researchers with limited resources. Both data sources are important as they may cover different gaps – taxonomic and geographic - in our quest to

The most traditional users of biological collections have been taxonomists and systematists that use them for identifying, naming and classifying species, for studying the diversity of species and the relationships among them through time (Baird, 2010). However, while these studies are essential for the development of other disciplines, such as ecology, biological collections are also essential data sources to help answer questions that interest and may involve many more individuals and knowledge areas including basic biology, human

Typically they help address questions on natural resource inventories, on the occurrence and distribution of species over space and time; on the reasons for changes that may have occurred; the effect of environmental change – including climate change - on biodiversity (Scoble, 2010). This applies to native (wild) species, to economically important species, to infectious disease vectors, and to invasive species for which distribution prediction can be

**2. Digitization and sharing of biological collection data** 

and if other techniques are applied to analyze the data.

species-level diversity of flowering plants.

climate changes.

change on their distribution.

know life on Earth.

very helpful.

economics, and public health.

Despite the broad use already in place, biological collection data still has a great potential to be used in research, on natural and agricultural resources management, on education and on sustainability science (Scoble, 2010).

A broader, more open and easier access to specimen data is vital to distribute information and in turn create knowledge (Canhos et al., 1994; Baird, 2010). However for this to become effective it is necessary to digitize data and make it available on the web. Only then we will be able to make plain use of the wealth of data and information which is hardly accessible in many cases in collections throughout the world and which, in many cases, only integrated can provide a better picture of a species scenario.

The digitization of collection data is in itself a challenge. It implies an important effort in terms of cost and time, which sometimes competes with other demands on those who digitize. The cost-effectiveness of data digitization is not easy to prove, especially when resources are scarce, although its scientific value can be agreed upon. In cases where an economically important question can directly benefit from the data, this can be less of a problem. Since both volume and quality of data are essential, digitization in a larger scale demands the effort to be prioritized, focused and sustained, according to Scoble (2010). The author also mentions the difference in digitization efforts that is required for different taxa, such as plants and insects, as a result of the methods used for mounting the specimens and the labels that contain the data to be digitized.

Currently biological data digitization is a global effort which is led by institutions such as GBIF and TDWG. The Global Biodiversity Information Facility (GBIF, www.gbif.org) was created in 2001 after a recommendation from a working group of the Megascience Forum of the Organization for Economic Cooperation and Development (OECD), and is open to participation of any country or international organization that agrees with its purpose of making scientific biodiversity information freely available. Its three core services and products are: "1. an information infrastructure – an Internet-based index of a globally distributed network of interoperable databases that contain primary biodiversity data; 2- Community-developed tools, standards and protocols – the tools data providers need to format and share their data; and 3 - Capacity-building – the training, access to international experts and mentoring programs that national and regional institutions need to become part of a decentralized network of biodiversity information facilities". Besides developing tools to be used by itself and by others, such as a data portal, GBIF provides access to more than 276 million occurrence registers (including specimens and observations) integrating in a single access point data of data providers from all over the world.

Other regional and national initiatives have collaborated and participated actively on the global effort towards digitizing and standardizing biological data: in Europe (ENHSIN – European Natural History Specimen Information Network, and EDIT - European Distributed Institute of Taxonomy), in America (IABIN – InterAmerican Biodiversity Information Network, with data from many countries in the continent – www.iabin.net).

The Biodiversity Information Standards (TDWG – www.tdwg.org), also known as the Taxonomic Databases Working Group, was originally formed to establish international collaboration among biological database projects. It now focuses on the development of standards for the exchange of biological data, having as mission also the promotion of the standards. Maybe the most important existing standard is Darwin Core (DwC), a standard for exchange of biological information. It is primarily based on taxa and their occurrence in nature as documented by observations, specimens, and samples, and related information. Other important standard is a protocol for data exchange, TAPIR - TDWG Access Protocol

Biodiversity in a Rapidly Changing World: How to Manage and Use Information? 351

an accurate organization of large data bank available to the general public at the CBOL

In bees, some studies are corroborating the effectiveness of the technique in species identification. The complete bee fauna of a taxonomically well resolved region was tested and the 150 species were correctly identified. Together with these results, they also identified some cryptic species and joined individuals from different sexes in the same species. In this last case, most of these species description was based in individuals from

Another example is the use this approach, combined with traditional morphological analysis, in a study of a taxonomically extremely difficult group of bees, the subgenus *Dialictus* (family Halictidae; genus *Lasioglossum*). In this case, DNA barcoding proved essential for the delimitation of numerous species that were morphologically almost indistinguishable. The main conclusion of these studies is that DNA barcoding is efficient at the detection of cryptic species, associating the sexes of dimorphic species, associating the castes of species with strong queen-worker dimorphism and as a generally useful tool for basic identification (Gibbs, 2009). (For a revision of successful cases see Packer et al., 2009). A global campaign to barcode the bees of the world has been initiated (see the website at:

The first attempts to classify bee subspecies of *Apis mellifera* was based mainly in differences in color and body size. However, since there is a great superimposition in these parameters, most of the classification systems based in these characteristics failed in correctly identify the individuals (Ruttner, 1988). In 1940, Goetze proposed a large number of measures, to be taken from several parts of the bee body in order to better differentiate the geographical ecotypes present all over its wide geographical distribution that encompasses Africa, Europe

However, all the analysis used until the moment were based in uni-variate statistics, which takes into consideration only one measure at a time and the range of the measures often overlap and turn more difficult to achieve a precise identification. It was only after the works of DuPraw (1964, 1965a; b) that the usage of multivariate statistics was proposed and, with help of Principal Component Analysis and Discriminant Analysis that the identifications became more precise. An important advance is also proposed in this series of works, where DuPraw (1965a) indicated the use of measures that are independent of size, like angles between vein junctions in the wings, avoiding the environmental effects, like

After these propositions and a series of small studies, it is published a guide to discriminate the subspecies of *A. mellifera* (Ruttner et al., 1978). In this work, the authors propose approximately 40 measures to be taken from several parts of at least 20 bees per colony to achieve a good confidence in the classification. It was based on morphometric results that the existence of evolutionary branches in *A. mellifera* (Ruttner et al., 1978) that were later confirmed by mitochondrial DNA (Franck et al., 2000), microsatellites (Estoup et al., 1995) and SNPs (Withfield et al., 2006). In spite of being very informative and confident, this kind

More recently, allied to the development of computational methods, the analysis became faster and some of them completely automated as ABIS (Automated Bee Identification

(Consortium for the Barcode of Life) website (Mitchell, 2008).

only one gender (Sheffield et al., 2009).

www.bee-bol.org).

and parts of Asia.

**4. Morphometric analysis** 

food availability, parasites and others.

of analysis is often very time consuming.

for Information Retrieval (www.tdwg.org/standards), which allows data to be exchanged among different systems, using agreed upon standards, such as DwC.

Currently one trend within the community of biodiversity informatics is to develop new standards for other contents, expanding from the current specimen/observation focus to other aspects of biodiversity, such as genomic data, interaction data (Saraiva et al., 2009) and species data, and multimedia data, such as images. The new contents will broaden the scope of the data networks and offer new possibilities for data analysis hopefully allowing address issues that are even closer to societal needs, cross-cutting different disciplines.

Specimen and observation data are fundamental to develop distribution models using ecological niche concepts. Molecular data and images are key to identify a specimen and to study the relationship between individuals and populations.
