DivShift: Exploring Domain-Specific Distribution Shift in Volunteer-Collected Biodiversity Datasets
- URL: http://arxiv.org/abs/2410.19816v1
- Date: Thu, 17 Oct 2024 23:56:30 GMT
- Title: DivShift: Exploring Domain-Specific Distribution Shift in Volunteer-Collected Biodiversity Datasets
- Authors: Elena Sierra, Lauren E. Gillespie, Salim Soltani, Moises Exposito-Alonso, Teja Kattenborn,
- Abstract summary: We introduce DivShift North American West Coast (DivShift-NAWC), a curated dataset of almost 8 million iNaturalist plant images.
We compare model performance across four known biases and observe that they indeed confound model performance.
We suggest practical strategies for curating datasets to train deep learning models for monitoring climate change's impacts on the world's biodiversity.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Climate change is negatively impacting the world's biodiversity. To build automated systems to monitor these negative biodiversity impacts, large-scale, volunteer-collected datasets like iNaturalist are built from community-identified, natural imagery. However, such volunteer-based data are opportunistic and lack a structured sampling strategy, resulting in geographic, temporal, observation quality, and socioeconomic, biases that stymie uptake of these models for downstream biodiversity monitoring tasks. Here we introduce DivShift North American West Coast (DivShift-NAWC), a curated dataset of almost 8 million iNaturalist plant images across the western coast of North America, for exploring the effects of these biases on deep learning model performance. We compare model performance across four known biases and observe that they indeed confound model performance. We suggest practical strategies for curating datasets to train deep learning models for monitoring climate change's impacts on the world's biodiversity.
Related papers
- BeetleVerse: A study on taxonomic classification of ground beetles [0.310688583550805]
Ground beetles are a highly sensitive and speciose biological indicator, making them vital for monitoring biodiversity.
In this paper, we evaluate 12 vision models on taxonomic classification across four diverse, long-tailed datasets.
arXiv Detail & Related papers (2025-04-18T01:06:37Z) - Combining Observational Data and Language for Species Range Estimation [63.65684199946094]
We propose a novel approach combining millions of citizen science species observations with textual descriptions from Wikipedia.
Our framework maps locations, species, and text descriptions into a common space, enabling zero-shot range estimation from textual descriptions.
Our approach also acts as a strong prior when combined with observational data, resulting in more accurate range estimation with less data.
arXiv Detail & Related papers (2024-10-14T17:22:55Z) - Causal Representation Learning in Temporal Data via Single-Parent Decoding [66.34294989334728]
Scientific research often seeks to understand the causal structure underlying high-level variables in a system.
Scientists typically collect low-level measurements, such as geographically distributed temperature readings.
We propose a differentiable method, Causal Discovery with Single-parent Decoding, that simultaneously learns the underlying latents and a causal graph over them.
arXiv Detail & Related papers (2024-10-09T15:57:50Z) - A Deep Learning-Based Approach for Mangrove Monitoring [0.0]
This work provides a comprehensive evaluation of recent deep-learning models on the task of mangrove segmentation.
We first introduce and make available a novel open-source dataset, MagSet-2, incorporating mangrove annotations from the Global Mangrove Watch and satellite images from Sentinel-2.
We then benchmark three architectural groups, namely convolutional, transformer, and mamba models, using the created dataset.
arXiv Detail & Related papers (2024-10-07T19:22:08Z) - Fine-tuning of Geospatial Foundation Models for Aboveground Biomass Estimation [2.3429628556845405]
Fine-tuning of a geospatial foundation model to estimate above-ground biomass has comparable performance to a U-Net trained from scratch.
We also explore the transfer-learning capabilities of the models by fine-tuning on satellite imagery with sparse labels from different eco-regions in Brazil.
arXiv Detail & Related papers (2024-06-28T12:54:10Z) - Imbalance-aware Presence-only Loss Function for Species Distribution
Modeling [3.4306175858244794]
This study assesses the effectiveness of training deep learning models using a balanced presence-only loss function on large citizen science-based datasets.
We demonstrate that this imbalance-aware loss function outperforms traditional loss functions across various datasets and tasks.
arXiv Detail & Related papers (2024-03-12T10:08:36Z) - SatBird: Bird Species Distribution Modeling with Remote Sensing and
Citizen Science Data [68.2366021016172]
We present SatBird, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird.
We also provide a dataset in Kenya representing low-data regimes.
We benchmark a set of baselines on our dataset, including SOTA models for remote sensing tasks.
arXiv Detail & Related papers (2023-11-02T02:00:27Z) - Evaluating and Incentivizing Diverse Data Contributions in Collaborative
Learning [89.21177894013225]
For a federated learning model to perform well, it is crucial to have a diverse and representative dataset.
We show that the statistical criterion used to quantify the diversity of the data, as well as the choice of the federated learning algorithm used, has a significant effect on the resulting equilibrium.
We leverage this to design simple optimal federated learning mechanisms that encourage data collectors to contribute data representative of the global population.
arXiv Detail & Related papers (2023-06-08T23:38:25Z) - Spatial Implicit Neural Representations for Global-Scale Species Mapping [72.92028508757281]
Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location.
Traditional methods struggle to take advantage of emerging large-scale crowdsourced datasets.
We use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously.
arXiv Detail & Related papers (2023-06-05T03:36:01Z) - On the Trade-off of Intra-/Inter-class Diversity for Supervised
Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset.
With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z) - Bird Distribution Modelling using Remote Sensing and Citizen Science
data [31.375576105932442]
Climate change is a major driver of biodiversity loss.
There are significant knowledge gaps about the distribution of species.
We propose an approach leveraging computer vision to improve species distribution modelling.
arXiv Detail & Related papers (2023-05-01T20:27:11Z) - A Comparative Study on Generative Models for High Resolution Solar
Observation Imaging [59.372588316558826]
This work investigates capabilities of current state-of-the-art generative models to accurately capture the data distribution behind observed solar activity states.
Using distributed training on supercomputers, we are able to train generative models for up to 1024x1024 resolution that produce high quality samples indistinguishable to human experts.
arXiv Detail & Related papers (2023-04-14T14:40:32Z) - Neuroevolution-based Classifiers for Deforestation Detection in Tropical
Forests [62.997667081978825]
Millions of hectares of tropical forests are lost every year due to deforestation or degradation.
Monitoring and deforestation detection programs are in use, in addition to public policies for the prevention and punishment of criminals.
This paper proposes the use of pattern classifiers based on neuroevolution technique (NEAT) in tropical forest deforestation detection tasks.
arXiv Detail & Related papers (2022-08-23T16:04:12Z) - Ensembles of Vision Transformers as a New Paradigm for Automated
Classification in Ecology [0.0]
We show that ensembles of Data-efficient image Transformers (DeiTs) significantly outperform the previous state of the art (SOTA)
On all the data sets we test, we achieve a new SOTA, with a reduction of the error with respect to the previous SOTA ranging from 18.48% to 87.50%.
arXiv Detail & Related papers (2022-03-03T14:16:22Z) - Jalisco's multiclass land cover analysis and classification using a
novel lightweight convnet with real-world multispectral and relief data [51.715517570634994]
We present our novel lightweight (only 89k parameters) Convolution Neural Network (ConvNet) to make LC classification and analysis.
In this work, we combine three real-world open data sources to obtain 13 channels.
Our embedded analysis anticipates the limited performance in some classes and gives us the opportunity to group the most similar.
arXiv Detail & Related papers (2022-01-26T14:58:51Z) - Meta-Learning for Few-Shot Land Cover Classification [3.8529010979482123]
We evaluate the model-agnostic meta-learning (MAML) algorithm on classification and segmentation tasks.
We find that few-shot model adaptation outperforms pre-training with regular gradient descent.
This indicates that model optimization with meta-learning may benefit tasks in the Earth sciences.
arXiv Detail & Related papers (2020-04-28T09:42:41Z) - Automatic image-based identification and biomass estimation of
invertebrates [70.08255822611812]
Time-consuming sorting and identification of taxa pose strong limitations on how many insect samples can be processed.
We propose to replace the standard manual approach of human expert-based sorting and identification with an automatic image-based technology.
We use state-of-the-art Resnet-50 and InceptionV3 CNNs for the classification task.
arXiv Detail & Related papers (2020-02-05T21:38:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.