FrogDeepSDM: Improving Frog Counting and Occurrence Prediction Using Multimodal Data and Pseudo-Absence Imputation
- URL: http://arxiv.org/abs/2510.19305v1
- Date: Wed, 22 Oct 2025 07:09:36 GMT
- Title: FrogDeepSDM: Improving Frog Counting and Occurrence Prediction Using Multimodal Data and Pseudo-Absence Imputation
- Authors: Chirag Padubidri, Pranesh Velmurugan, Andreas Lanitis, Andreas Kamilaris,
- Abstract summary: Species Distribution Modelling (SDM) helps predict species presence across large regions.<n>In this study, we enhance SDM accuracy for frogs (Anura) by applying deep learning and data imputation techniques.<n>Experiments show that data balancing significantly improved model performance, reducing the Mean Absolute Error (MAE) from 189 to 29 in frog counting tasks.
- Score: 0.9537146822132906
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monitoring species distribution is vital for conservation efforts, enabling the assessment of environmental impacts and the development of effective preservation strategies. Traditional data collection methods, including citizen science, offer valuable insights but remain limited in coverage and completeness. Species Distribution Modelling (SDM) helps address these gaps by using occurrence data and environmental variables to predict species presence across large regions. In this study, we enhance SDM accuracy for frogs (Anura) by applying deep learning and data imputation techniques using data from the "EY - 2022 Biodiversity Challenge." Our experiments show that data balancing significantly improved model performance, reducing the Mean Absolute Error (MAE) from 189 to 29 in frog counting tasks. Feature selection identified key environmental factors influencing occurrence, optimizing inputs while maintaining predictive accuracy. The multimodal ensemble model, integrating land cover, NDVI, and other environmental inputs, outperformed individual models and showed robust generalization across unseen regions. The fusion of image and tabular data improved both frog counting and habitat classification, achieving 84.9% accuracy with an AUC of 0.90. This study highlights the potential of multimodal learning and data preprocessing techniques such as balancing and imputation to improve predictive ecological modeling when data are sparse or incomplete, contributing to more precise and scalable biodiversity monitoring.
Related papers
- Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z) - Multi-environment Invariance Learning with Missing Data [0.0]
In this work, we establish non-asymptotic guarantees on variable selection property and $ell$ error convergence rates.<n>We evaluate the performance of the new estimator through extensive simulations and demonstrate its application using the UCI Bike Sharing dataset.
arXiv Detail & Related papers (2026-01-12T06:30:58Z) - GREAT: Generalizable Representation Enhancement via Auxiliary Transformations for Zero-Shot Environmental Prediction [27.332543340013814]
Generalizable Representation Enhancement via Auxiliary Transformations (GREAT)<n>We introduce GREAT, a framework that effectively augments available datasets to improve predictions in completely unseen regions.<n>We demonstrate GREAT's effectiveness on stream temperature prediction across six ecologically diverse watersheds in the eastern U.S.
arXiv Detail & Related papers (2025-11-17T15:11:03Z) - BATIS: Bayesian Approaches for Targeted Improvement of Species Distribution Models [15.029163153558533]
Species distribution models (SDMs) aim to predict species occurrence based on environmental variables.<n>Recent deep learning advances for SDMs have been shown to perform well on complex and heterogeneous datasets.<n>We introduce BATIS, a novel and practical framework wherein prior predictions are updated iteratively using limited observational data.
arXiv Detail & Related papers (2025-10-22T16:42:46Z) - Advancing Tabular Stroke Modelling Through a Novel Hybrid Architecture and Feature-Selection Synergy [0.9999629695552196]
The present work develops and validates a data-driven and interpretable machine-learning framework designed to predict strokes.<n>Ten routinely gathered demographic, lifestyle, and clinical variables were sourced from a public cohort of 4,981 records.<n>The proposed model achieved an accuracy rate of 97.2% and an F1-score of 97.15%, indicating a significant enhancement compared to the leading individual model.
arXiv Detail & Related papers (2025-05-18T21:46:45Z) - MaskSDM with Shapley values to improve flexibility, robustness, and explainability in species distribution modeling [3.428447509258587]
Species Distribution Models (SDMs) play a vital role in biodiversity research, conservation planning, and ecological niche modeling.<n>We introduce MaskSDM, a novel deep learning-based SDM that enables flexible predictor selection by employing a masked training strategy.<n>We evaluate MaskSDM on the global sPlotOpen dataset, modeling the distributions of 12,738 plant species.
arXiv Detail & Related papers (2025-03-17T11:02:28Z) - Combining Observational Data and Language for Species Range Estimation [63.65684199946094]
We propose a novel approach combining millions of citizen science species observations with textual descriptions from Wikipedia.<n>Our framework maps locations, species, and text descriptions into a common space, enabling zero-shot range estimation from textual descriptions.<n>Our approach also acts as a strong prior when combined with observational data, resulting in more accurate range estimation with less data.
arXiv Detail & Related papers (2024-10-14T17:22:55Z) - Beyond Tides and Time: Machine Learning Triumph in Water Quality [0.0]
This study aims to establish a robust predictive pipeline to both data science experts and those without domain specific knowledge.
Our research aims to establish a robust predictive pipeline to both data science experts and those without domain specific knowledge.
arXiv Detail & Related papers (2023-09-29T03:33:53Z) - B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under
Hidden Confounding [51.74479522965712]
We propose a meta-learner called the B-Learner, which can efficiently learn sharp bounds on the CATE function under limits on hidden confounding.
We prove its estimates are valid, sharp, efficient, and have a quasi-oracle property with respect to the constituent estimators under more general conditions than existing methods.
arXiv Detail & Related papers (2023-04-20T18:07:19Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.