Periodic Reporting for period 1 - OptimCS (Optimising big data from citizen science projects for biodiversity research)
Période du rapport: 2020-11-01 au 2022-10-31
Objective 1: Develop algorithms to optimise sampling by citizen scientists in time and space.
Objective 2: Experimentally determine the willingness of citizen scientists to sample more strategically.
I developed a framework aimed at sampling biodiversity using eBird citizen science, demonstrated in the state of Florida. This framework uses landcover variables, including tree cover, water cover, and habitat heterogeneity, along with observed species richness and the number of checklists. It then puts these into a structural equation model. Our SEM showed a strong influence of land-cover in predicting species richness, with urban cover the strongest supported, followed by habitat heterogeneity, tree cover, and water cover. The number of checklists at a site was also predicted by the percentage of urban cover and habitat heterogeneity, suggesting that these two land-cover attributes influence where people submit eBird checklists. The number of checklists was also higher in grids with higher species richness. I found that a relatively small number of samples are needed to meet 95% sampling completeness when diversity estimation is focused on dominant species: 43, 64, 96, 123, 172, and 176 for 5 × 5, 10 × 10, 15 × 15, 20 × 20, 25 × 25, and 30 × 30-km2 grain sizes, respectively.
Additionally, I quantified the differences in bias between unstructured and structured citizen science data. I found strong evidence that large-bodied birds were over-represented in the unstructured citizen science dataset; moderate evidence that common species were over-represented in the unstructured dataset; strong evidence that species in large groups were over-represented; and no evidence that colorful species were over-represented in unstructured citizen science data. These results suggest that biases exist in unstructured citizen science data when compared with semi-structured data, likely as a result of the detectability of a species and the inherent recording process. Importantly, in programs like iNaturalist the detectability process is two-fold—first, an individual organism needs to be detected, and second, it needs to be photographed, which is likely easier for many large-bodied species. Results indicate that caution is warranted when using unstructured citizen science data in ecological modelling, and highlight body size as a fundamental trait that can be used as a covariate for modelling opportunistic species occurrence records, representing the detectability or identifiability in unstructured citizen science datasets.
Objective 2: Experimentally determine the willingness of citizen scientists to sample more strategically.
Using a novel experimental design I found that indeed individuals were willing to sample more strategically and that biodiversity sampling can be improved through the use of behavioral nudges. I developed a framework to estimate ‘priority’ of a given citizen science sample based on species richness and then used a controlled study where participants were asked to contribute according to where the ‘highest priority’ grid cells were. Relative to the available area in a study region, the low priority cells in the control study regions accounted for 36% (Lake Macquarie) and 61% (Wingecarribee) of sampling, whereas for our study regions presented with a dynamic map low priority cells accounted for only 12% (Wollongong) and 23% (Central Coast) of sampling, and for study regions presented with a dynamic map and a leaderboard, low priority cells accounted for only 15% (Blue Mountains) and 22% (Hornsby) of sampling. Further, the high priority cells in the control study regions accounted for 32% (Lake Macquarie) and 15% (Wingecarribee) of sampling, whereas for our study regions presented with a dynamic map, high priority cells accounted for 58% (Wollongong) and 8% (Central Coast) of sampling, and for the study regions presented with a dynamic map and a leaderboard, high priority cells accounted for 73% (Hornsby) and 67% (Blue Mountains) of sampling.