Spatial Clustering of Citizen Science Data Improves Downstream Species Distribution Models
- URL: http://arxiv.org/abs/2412.15559v3
- Date: Fri, 17 Jan 2025 02:50:03 GMT
- Title: Spatial Clustering of Citizen Science Data Improves Downstream Species Distribution Models
- Authors: Nahian Ahmed, Mark Roth, Tyler A. Hallman, W. Douglas Robinson, Rebecca A. Hutchinson,
- Abstract summary: Occupancy modeling accounts for imperfect detection by modeling the observation process separately from the biological process of habitat selection.
Existing approaches for constructing sites discard some observations and/or consider only geographic distance and not environmental similarity.
We compare ten approaches for site construction in terms of their impact on downstream species distribution models for 31 bird species in Oregon.
- Score: 1.3931054166001273
- License:
- Abstract: Citizen science biodiversity data present great opportunities for ecology and conservation across vast spatial and temporal scales. However, the opportunistic nature of these data lacks the sampling structure required by modeling methodologies that address a pervasive challenge in ecological data collection: imperfect detection, i.e., the likelihood of under-observing species on field surveys. Occupancy modeling is an example of an approach that accounts for imperfect detection by explicitly modeling the observation process separately from the biological process of habitat selection. This produces species distribution models that speak to the pattern of the species on a landscape after accounting for imperfect detection in the data, rather than the pattern of species observations corrupted by errors. To achieve this benefit, occupancy models require multiple surveys of a site across which the site's status (i.e., occupied or not) is assumed constant. Since citizen science data are not collected under the required repeated-visit protocol, observations may be grouped into sites post hoc. Existing approaches for constructing sites discard some observations and/or consider only geographic distance and not environmental similarity. In this study, we compare ten approaches for site construction in terms of their impact on downstream species distribution models for 31 bird species in Oregon, using observations recorded in the eBird database. We find that occupancy models built on sites constructed by spatial clustering algorithms perform better than existing alternatives.
Related papers
- Causal Representation Learning in Temporal Data via Single-Parent Decoding [66.34294989334728]
Scientific research often seeks to understand the causal structure underlying high-level variables in a system.
Scientists typically collect low-level measurements, such as geographically distributed temperature readings.
We propose a differentiable method, Causal Discovery with Single-parent Decoding, that simultaneously learns the underlying latents and a causal graph over them.
arXiv Detail & Related papers (2024-10-09T15:57:50Z) - Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Predicting Species Occurrence Patterns from Partial Observations [21.009271008147785]
We introduce the problem of predicting species occurrence patterns given (a) satellite imagery, and (b) known information on the occurrence of other species.
To evaluate algorithms on this task, we introduce SatButterfly, a dataset of satellite images, environmental data and observational data for butterflies.
We propose a general model, R-Tran, for predicting species occurrence patterns that enables the use of partial observational data wherever found.
arXiv Detail & Related papers (2024-03-26T18:29:39Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - LD-SDM: Language-Driven Hierarchical Species Distribution Modeling [9.620416509546471]
We focus on the problem of species distribution modeling using global-scale presence-only data.
To capture a stronger implicit relationship between species, we encode the taxonomic hierarchy of species using a large language model.
We propose a novel proximity-aware evaluation metric that enables evaluating species distribution models.
arXiv Detail & Related papers (2023-12-13T18:11:37Z) - SatBird: Bird Species Distribution Modeling with Remote Sensing and
Citizen Science Data [68.2366021016172]
We present SatBird, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird.
We also provide a dataset in Kenya representing low-data regimes.
We benchmark a set of baselines on our dataset, including SOTA models for remote sensing tasks.
arXiv Detail & Related papers (2023-11-02T02:00:27Z) - Spatial Implicit Neural Representations for Global-Scale Species Mapping [72.92028508757281]
Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location.
Traditional methods struggle to take advantage of emerging large-scale crowdsourced datasets.
We use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously.
arXiv Detail & Related papers (2023-06-05T03:36:01Z) - Improving Heterogeneous Model Reuse by Density Estimation [105.97036205113258]
This paper studies multiparty learning, aiming to learn a model using the private data of different participants.
Model reuse is a promising solution for multiparty learning, assuming that a local model has been trained for each party.
arXiv Detail & Related papers (2023-05-23T09:46:54Z) - Bird Distribution Modelling using Remote Sensing and Citizen Science
data [31.375576105932442]
Climate change is a major driver of biodiversity loss.
There are significant knowledge gaps about the distribution of species.
We propose an approach leveraging computer vision to improve species distribution modelling.
arXiv Detail & Related papers (2023-05-01T20:27:11Z) - Towards Adaptive Benthic Habitat Mapping [9.904746542801838]
We show how a habitat model can be used to plan efficient Autonomous Underwater Vehicles (AUVs) surveys.
A Bayesian neural network is used to predict visually-derived habitat classes when given broad-scale bathymetric data.
We demonstrate how these structured uncertainty estimates can be utilised to improve the model with fewer samples.
arXiv Detail & Related papers (2020-06-20T01:03:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.