*2.3.1 Data credibility*

*Wildlife Population Monitoring*

added to the database each year.

respectively. As a prominent example, the eBird project [21, 35], launched in 2002 by the Ornithology Lab at Cornell University and the National Audubon Society, as of November 2016, has engaged over 330,000 bird watchers from more than 250 countries who have reported observations of over 10,300 bird species. As of June 2018, eBird has accumulated over 500 million bird observations in its global database; in recent years, there have been more than 100 million bird observations

Wildlife data contributed by participants in such citizen science projects are a form of geospatial big data [36, 37]. Complex patterns can be discovered from such intensive data through visualizations, simulations, data mining, and various modeling techniques to provide valuable insight for forming concrete hypotheses about the underlying ecological, biological, and geographical processes that generated the observed data [37]. Thus, the abundance of citizen-contributed wildlife data has the potential of shifting research paradigm in biological, ecological, and geographical studies from the traditional hypothesis-driven approach to the emerging datadriven approach; for instance, scholars are promoting the idea of "data-intensive

Citizen science has several advantages as an alternative mechanism for collecting wildlife data. Citizen-contributed data contain rich local information that spans a wide temporal spectrum because citizens, as local experts and sensors [20], have long been sensing and accumulating knowledge of their respective areas. Citizen science also has the potential to provide wildlife data over large areas, given that billions of networked human sensors are distributed across the globe. In addition, citizen science can provide timely updated wildlife data that are difficult to obtain and maintain through other techniques but can be easily elicited from citizens living in the local areas. Moreover, citizen-contributed data are much less expensive than traditional scientific data collection protocols (e.g., biological survey). In many cases, citizens contribute data purely voluntarily [20]. This low cost is of great practical significance in many real-world programs falling short of funding support. Due to the above advantages of citizen science, it is possible to obtain timely updated wildlife data using citizen science over large areas. Citizen science thus has a great potential to support and sustain long-time wildlife population monitoring at large spatial scale (e.g., eBird) and provide wildlife data for wildlife habitat assessment. In spite of the strengths, one should be aware of the shortcomings of the "citizen science" approach to wildlife data collection. For example, this approach cannot be used in areas with low population where sufficient local citizen observers/

informants are lacking. It is also not good for collecting data on evasive animals with little contact with humans. Most importantly, there can be data quality issues associated with wildlife data contributed by volunteer citizens (i.e., non-professionals) which make the data challenging to standardize and analyze [17, 18]. The following sections detail some of the data quality issues, their implications for wildlife habitat assessment, and how GIS techniques (geovisualization, geospatial analysis, geocomputation, etc.) can be adopted to tackle the issues toward reducing the impact

The quality of citizen-contributed wildlife data is the major concern when using such data for wildlife habitat assessment. The average citizens engaged in citizen science projects are not well-trained professionals; their voluntary data collection

of such issues on wildlife habitat assessment.

**2.3 The data quality issues of citizen-contributed wildlife data**

science" for biodiversity studies and "data-driven geography" [36–38].

**2.2 The (dis)advantages of citizen science for collecting wildlife data**

**22**

In order to be useful for wildlife habitat assessment, wildlife data (e.g., sightings) reported by citizen participants need to be credible, that is, provide ground truth wildlife observations. Data credibility is affected by the characteristics of both the wildlife and the citizen observers (e.g., local residents). On the one hand, local residents often only observe wildlife that is active in the daytime. The target wildlife should be easily recognizable to reduce misidentification given that local residents usually have no training on species identification [17, 40]. On the other hand, local resident knowledge of the target wildlife, age, length of residence, and formal education also influence data credibility [41]. For instance, performance in georeferencing tasks differs between novice and expert citizen participants [42]; there exists both between-observer differences [43] and within-observer differences (over time) [44] in BBS participant bird-counting skills.

Various methods have been developed for increasing the credibility of citizencontributed wildlife data. Ref. [45] identified a total of 12 strategies that have been adopted by citizen science programs to increase their data credibility across different program stages including training and planning, data collection, and data analysis and program evaluation. As an example, eBird uses a two-part approach to assure data credibility during data entry [39]: automated data quality filters flag records for review based on observation date and geographic location; a flagged entry, once confirmed as legitimate by the observer, is then reviewed by a regional expert reviewer again.

### *2.3.2 Positional accuracy*

Position of the wildlife data used for habitat suitability mapping needs to be accurate so that the locations can be used to accurately obtain the corresponding environmental conditions at these locations from environmental data layers. Insufficient positional accuracy of wildlife data leads to mismatch between the locations of wildlife habitat use and the corresponding environmental conditions, and thus degrades the accuracy of environmental niche modeling and habitat suitability mapping [46].

Nonetheless, it is also important to note that the impact of positional accuracy of wildlife data on habitat suitability mapping depends on the spatial resolution at which suitability mapping is conducted. Mapping at high spatial resolution (e.g., using environmental data of 30 m × 30 m grids) definitely requires wildlife data of high positional accuracy that is comparable to the spatial resolution of the environmental data so that values of the environmental conditions at these locations can be accurately extracted from environmental data layers. In contrast, for mapping at coarse spatial resolution (e.g., 1000 m × 1000 m grids), the absolute positional accuracy of wildlife data does not have to be very high as long as it is accurate enough relative to the spatial resolution of environmental data in use.

### *2.3.3 Spatial bias*

Wildlife observations contributed by citizens are often concentrated more in some geographic areas than others (i.e., spatial bias) because observations made by citizens are opportunistic in nature [23]. Unlike well-designed sampling or survey schemes which allocate observation sites in a way such that the geographic space and/or the environmental space are well covered by the observation sites, spatial distribution of the observation efforts of citizen volunteers would be considered neither random nor regular in the sense of sampling or survey design. One example to demonstrate this is wildlife sightings elicited from local residents. Local residents are not intentionally tracking wildlife of interest. Instead, they typically spot the wildlife en route to doing something else. The routes on which local citizens spot wildlife would be considered neither random nor regular but "ad hoc" [23]. As a result, wildlife sightings elicited from local residents are usually concentrated in areas with higher route accessibility.

Such spatial bias in wildlife data has a significant impact on environmental niche modeling and habitat suitability mapping for wildlife habitat assessment. Due to the spatial bias, citizen-contributed wildlife data might not be representative of the actual wildlife habitat use. The relationship derived based on the wildlife data thus might not well represent the underlying environmental niche. Spatial bias in citizen-contributed wildlife data, if not appropriately accounted for, would adversely affect the accuracy of wildlife habitat suitability mapping [47–49].
