Combining Observational Data and Language for Species Range Estimation
- URL: http://arxiv.org/abs/2410.10931v1
- Date: Mon, 14 Oct 2024 17:22:55 GMT
- Title: Combining Observational Data and Language for Species Range Estimation
- Authors: Max Hamilton, Christian Lange, Elijah Cole, Alexander Shepard, Samuel Heinrich, Oisin Mac Aodha, Grant Van Horn, Subhransu Maji,
- Abstract summary: We propose a novel approach combining millions of citizen science species observations with textual descriptions from Wikipedia.
Our framework maps locations, species, and text descriptions into a common space, enabling zero-shot range estimation from textual descriptions.
Our approach also acts as a strong prior when combined with observational data, resulting in more accurate range estimation with less data.
- Score: 63.65684199946094
- License:
- Abstract: Species range maps (SRMs) are essential tools for research and policy-making in ecology, conservation, and environmental management. However, traditional SRMs rely on the availability of environmental covariates and high-quality species location observation data, both of which can be challenging to obtain due to geographic inaccessibility and resource constraints. We propose a novel approach combining millions of citizen science species observations with textual descriptions from Wikipedia, covering habitat preferences and range descriptions for tens of thousands of species. Our framework maps locations, species, and text descriptions into a common space, facilitating the learning of rich spatial covariates at a global scale and enabling zero-shot range estimation from textual descriptions. Evaluated on held-out species, our zero-shot SRMs significantly outperform baselines and match the performance of SRMs obtained using tens of observations. Our approach also acts as a strong prior when combined with observational data, resulting in more accurate range estimation with less data. We present extensive quantitative and qualitative analyses of the learned representations in the context of range estimation and other spatial tasks, demonstrating the effectiveness of our approach.
Related papers
- GeoPlant: Spatial Plant Species Prediction Dataset [4.817737198128259]
Species Distribution Models (SDMs) predict species across space from spatially explicit features.
We have designed and developed a new European-scale dataset for SDMs at high spatial resolution (10-50 m)
The dataset comprises 5M heterogeneous Presence-Only records and 90k exhaustive Presence-Absence survey records.
arXiv Detail & Related papers (2024-08-25T20:09:46Z) - Enhancing Ecological Monitoring with Multi-Objective Optimization: A Novel Dataset and Methodology for Segmentation Algorithms [17.802456388479616]
We introduce a unique semantic segmentation dataset of 6,096 high-resolution aerial images capturing indigenous and invasive grass species in Bega Valley, New South Wales, Australia.
This dataset presents a challenging task due to the overlap and distribution of grass species.
The dataset and code will be made publicly available, aiming to drive research in computer vision, machine learning, and ecological studies.
arXiv Detail & Related papers (2024-07-25T18:27:27Z) - Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - LD-SDM: Language-Driven Hierarchical Species Distribution Modeling [9.620416509546471]
We focus on the problem of species distribution modeling using global-scale presence-only data.
To capture a stronger implicit relationship between species, we encode the taxonomic hierarchy of species using a large language model.
We propose a novel proximity-aware evaluation metric that enables evaluating species distribution models.
arXiv Detail & Related papers (2023-12-13T18:11:37Z) - SatBird: Bird Species Distribution Modeling with Remote Sensing and
Citizen Science Data [68.2366021016172]
We present SatBird, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird.
We also provide a dataset in Kenya representing low-data regimes.
We benchmark a set of baselines on our dataset, including SOTA models for remote sensing tasks.
arXiv Detail & Related papers (2023-11-02T02:00:27Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - Spatial Implicit Neural Representations for Global-Scale Species Mapping [72.92028508757281]
Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location.
Traditional methods struggle to take advantage of emerging large-scale crowdsourced datasets.
We use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously.
arXiv Detail & Related papers (2023-06-05T03:36:01Z) - StatEcoNet: Statistical Ecology Neural Networks for Species Distribution
Modeling [8.534315844706367]
This paper focuses on a core task in computational sustainability and statistical ecology: species distribution modeling (SDM)
In SDM, the occurrence pattern of a species on a landscape is predicted by environmental features based on observations at a set of locations.
To address the unique challenges of SDM, this paper proposes a framework called StatEcoNet.
arXiv Detail & Related papers (2021-02-17T02:19:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.