On the selection and effectiveness of pseudo-absences for species
distribution modeling with deep learning
- URL: http://arxiv.org/abs/2401.02989v1
- Date: Wed, 3 Jan 2024 16:06:30 GMT
- Title: On the selection and effectiveness of pseudo-absences for species
distribution modeling with deep learning
- Authors: Robin Zbinden, Nina van Tiel, Benjamin Kellenberger, Lloyd Hughes,
Devis Tuia
- Abstract summary: Species distribution modeling is a versatile tool for understanding the relationship between environmental conditions and species occurrences.
To overcome this limitation, a common approach is to employ pseudo-absences, which are specific geographic locations designated as negative samples.
In this paper, we demonstrate that these challenges can be effectively tackled by integrating pseudo-absences in the training of multi-species neural networks.
- Score: 3.8974747170521287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Species distribution modeling is a highly versatile tool for understanding
the intricate relationship between environmental conditions and species
occurrences. However, the available data often lacks information on confirmed
species absence and is limited to opportunistically sampled, presence-only
observations. To overcome this limitation, a common approach is to employ
pseudo-absences, which are specific geographic locations designated as negative
samples. While pseudo-absences are well-established for single-species
distribution models, their application in the context of multi-species neural
networks remains underexplored. Notably, the significant class imbalance
between species presences and pseudo-absences is often left unaddressed.
Moreover, the existence of different types of pseudo-absences (e.g., random and
target-group background points) adds complexity to the selection process.
Determining the optimal combination of pseudo-absences types is difficult and
depends on the characteristics of the data, particularly considering that
certain types of pseudo-absences can be used to mitigate geographic biases. In
this paper, we demonstrate that these challenges can be effectively tackled by
integrating pseudo-absences in the training of multi-species neural networks
through modifications to the loss function. This adjustment involves assigning
different weights to the distinct terms of the loss function, thereby
addressing both the class imbalance and the choice of pseudo-absence types.
Additionally, we propose a strategy to set these loss weights using spatial
block cross-validation with presence-only data. We evaluate our approach using
a benchmark dataset containing independent presence-absence data from six
different regions and report improved results when compared to competing
approaches.
Related papers
- Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Nonparametric Identifiability of Causal Representations from Unknown
Interventions [63.1354734978244]
We study causal representation learning, the task of inferring latent causal variables and their causal relations from mixtures of the variables.
Our goal is to identify both the ground truth latents and their causal graph up to a set of ambiguities which we show to be irresolvable from interventional data.
arXiv Detail & Related papers (2023-06-01T10:51:58Z) - Deep Variational Lesion-Deficit Mapping [0.3914676152740142]
We introduce a comprehensive framework for lesion-deficit model comparison.
We show that our model outperforms established methods by a substantial margin across all simulation scenarios.
Our analysis justifies the widespread adoption of this approach.
arXiv Detail & Related papers (2023-05-27T13:49:35Z) - The Decaying Missing-at-Random Framework: Doubly Robust Causal Inference
with Partially Labeled Data [10.021381302215062]
In real-world scenarios, data collection limitations often result in partially labeled datasets, leading to difficulties in drawing reliable causal inferences.
Traditional approaches in the semi-parametric (SS) and missing data literature may not adequately handle these complexities, leading to biased estimates.
This framework tackles missing outcomes in high-dimensional settings and accounts for selection bias.
arXiv Detail & Related papers (2023-05-22T07:37:12Z) - Leveraging Relational Information for Learning Weakly Disentangled
Representations [11.460692362624533]
Disentanglement is a difficult property to enforce in neural representations.
We present an alternative view over learning (weakly) disentangled representations.
arXiv Detail & Related papers (2022-05-20T09:58:51Z) - Analyzing the Effects of Handling Data Imbalance on Learned Features
from Medical Images by Looking Into the Models [50.537859423741644]
Training a model on an imbalanced dataset can introduce unique challenges to the learning problem.
We look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features.
arXiv Detail & Related papers (2022-04-04T09:38:38Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates.
We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters.
We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z) - Investigate the Essence of Long-Tailed Recognition from a Unified
Perspective [11.080317683184363]
deep recognition models often suffer from long-tailed data distributions due to heavy imbalanced sample number across categories.
In this work, we demonstrate that long-tailed recognition suffers from both sample number and category similarity.
arXiv Detail & Related papers (2021-07-08T11:08:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.