Towards AI-Guided Open-World Ecological Taxonomic Classification
- URL: http://arxiv.org/abs/2512.18994v1
- Date: Mon, 22 Dec 2025 03:20:05 GMT
- Title: Towards AI-Guided Open-World Ecological Taxonomic Classification
- Authors: Cheng Yaw Low, Heejoon Koo, Jaewoo Park, Kaleb Mesfin Asfaw, Meeyoung Cha,
- Abstract summary: TaxoNet is an embedding-based encoder with a dual-marginization loss that strengthens learning signals from rare underrepresented taxa.<n>Our findings show that general-purpose multimodal foundation models remain constrained in plant-domain applications.
- Score: 25.577016053193862
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: AI-guided classification of ecological families, genera, and species underpins global sustainability efforts such as biodiversity monitoring, conservation planning, and policy-making. Progress toward this goal is hindered by long-tailed taxonomic distributions from class imbalance, along with fine-grained taxonomic variations, test-time spatiotemporal domain shifts, and closed-set assumptions that can only recognize previously seen taxa. We introduce the Open-World Ecological Taxonomy Classification, a unified framework that captures the co-occurrence of these challenges in realistic ecological settings. To address them, we propose TaxoNet, an embedding-based encoder with a dual-margin penalization loss that strengthens learning signals from rare underrepresented taxa while mitigating the dominance of overrepresented ones, directly confronting interrelated challenges. We evaluate our method on diverse ecological domains: Google Auto-Arborist (urban trees), iNat-Plantae (Plantae observations from various ecosystems in iNaturalist-2019), and NAFlora-Mini (a curated herbarium collection). Our model consistently outperforms baselines, particularly for rare taxa, establishing a strong foundation for open-world plant taxonomic monitoring. Our findings further show that general-purpose multimodal foundation models remain constrained in plant-domain applications.
Related papers
- LabelKAN -- Kolmogorov-Arnold Networks for Inter-Label Learning: Avian Community Learning [15.708656410014685]
We introduce LabelKAN, a novel framework based on Kolmogorov-Arnold Networks (KANs) to learn inter-label connections from predictions of each label.<n>When modeling avian species distributions, LabelKAN achieves substantial gains in predictive performance across the vast majority of species.
arXiv Detail & Related papers (2026-01-23T15:50:50Z) - Evolving Graph Learning for Out-of-Distribution Generalization in Non-stationary Environments [61.62036321848316]
Graph neural networks (GNNs) have shown remarkable success in exploiting the spatial and temporal patterns on dynamic graphs.<n>Existing GNNs exhibit poor ability under distribution shifts, which is inevitable in dynamic scenarios.<n>This paper proposes Evolving Graph Learning framework for evolving graph generalization (Evoal) by environment-aware invariant pattern recognition.
arXiv Detail & Related papers (2025-11-04T08:22:29Z) - Continental-scale habitat distribution modelling with multimodal earth observation foundation models [0.0]
Habitats integrate the abiotic conditions, vegetation composition and structure that support biodiversity and sustain nature's contributions to people.<n>Current habitat maps often fall short in thematic or spatial resolution.<n>We evaluated how high-resolution remote sensing (RS) data and Artificial Intelligence (AI) tools can improve habitat mapping.
arXiv Detail & Related papers (2025-07-13T18:11:26Z) - BioAnalyst: A Foundation Model for Biodiversity [0.565395466029518]
We introduce BioAnalyst, the first Foundation Model tailored for biodiversity analysis and conservation planning.<n>BioAnalyst employs a transformer-based architecture, pretrained on extensive multi-modal datasets.<n>We evaluate the model's performance on two downstream use cases, demonstrating its generalisability compared to existing methods.
arXiv Detail & Related papers (2025-07-11T23:56:08Z) - TerraIncognita: A Dynamic Benchmark for Species Discovery Using Frontier Models [15.272215321742802]
Current methods for insect species discovery are manual, slow, and severely constrained by taxonomic expertise.<n>We introduce TerraIncognita, a benchmark designed to evaluate state-of-the-art multimodal models for the challenging problem.<n>Our benchmark dataset combines a mix of expertly annotated images of insect species likely known to frontier AI models, and images of rare and poorly known species.
arXiv Detail & Related papers (2025-05-29T15:20:15Z) - Feedforward Few-shot Species Range Estimation [61.60698161072356]
Knowing where a particular species can or cannot be found on Earth is crucial for ecological research and conservation efforts.<n> accurate range estimates are only available for a relatively small proportion of all known species.<n>We outline a new approach for few-shot species range estimation to address the challenge of accurately estimating the range of a species from limited data.
arXiv Detail & Related papers (2025-02-20T19:13:29Z) - Combining Observational Data and Language for Species Range Estimation [63.65684199946094]
We propose a novel approach combining millions of citizen science species observations with textual descriptions from Wikipedia.<n>Our framework maps locations, species, and text descriptions into a common space, enabling zero-shot range estimation from textual descriptions.<n>Our approach also acts as a strong prior when combined with observational data, resulting in more accurate range estimation with less data.
arXiv Detail & Related papers (2024-10-14T17:22:55Z) - FREE: The Foundational Semantic Recognition for Modeling Environmental Ecosystems [56.0640340392818]
We introduce a framework, FREE, that enables the use of varying features and available information to train a universal model.<n>The core idea is to map available environmental data into a text space and then convert the traditional predictive modeling task in environmental science to a semantic recognition problem.<n>Our evaluation on two societally important real-world applications, stream water temperature prediction and crop yield prediction, demonstrates the superiority of FREE over multiple baselines.
arXiv Detail & Related papers (2023-11-17T00:53:09Z) - SatBird: Bird Species Distribution Modeling with Remote Sensing and
Citizen Science Data [68.2366021016172]
We present SatBird, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird.
We also provide a dataset in Kenya representing low-data regimes.
We benchmark a set of baselines on our dataset, including SOTA models for remote sensing tasks.
arXiv Detail & Related papers (2023-11-02T02:00:27Z) - Neuroevolution-based Classifiers for Deforestation Detection in Tropical
Forests [62.997667081978825]
Millions of hectares of tropical forests are lost every year due to deforestation or degradation.
Monitoring and deforestation detection programs are in use, in addition to public policies for the prevention and punishment of criminals.
This paper proposes the use of pattern classifiers based on neuroevolution technique (NEAT) in tropical forest deforestation detection tasks.
arXiv Detail & Related papers (2022-08-23T16:04:12Z) - Ensembles of Vision Transformers as a New Paradigm for Automated
Classification in Ecology [0.0]
We show that ensembles of Data-efficient image Transformers (DeiTs) significantly outperform the previous state of the art (SOTA)
On all the data sets we test, we achieve a new SOTA, with a reduction of the error with respect to the previous SOTA ranging from 18.48% to 87.50%.
arXiv Detail & Related papers (2022-03-03T14:16:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.