Tighnari v2: Mitigating Label Noise and Distribution Shift in Multimodal Plant Distribution Prediction via Mixture of Experts and Weakly Supervised Learning
- URL: http://arxiv.org/abs/2602.08282v1
- Date: Mon, 09 Feb 2026 05:23:22 GMT
- Title: Tighnari v2: Mitigating Label Noise and Distribution Shift in Multimodal Plant Distribution Prediction via Mixture of Experts and Weakly Supervised Learning
- Authors: Haixu Liu, Yufei Wang, Tianxiang Xu, Chuancheng Shi, Hongsheng Xing,
- Abstract summary: Presence-Only (PO) data offer broad spatial coverage and rich distribution but suffer from severe label noise in negative samples.<n>This paper proposes a multimodal fusion framework that fully leverages the strengths of both PA and PO data.<n> Experiments on the GeoLifeCLEF 2025 demonstrate that our approach achieves superior predictive performance in scenarios with limited PA coverage and pronounced distribution shifts.
- Score: 4.888851550406879
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale, cross-species plant distribution prediction plays a crucial role in biodiversity conservation, yet modeling efforts in this area still face significant challenges due to the sparsity and bias of observational data. Presence-Absence (PA) data provide accurate and noise-free labels, but are costly to obtain and limited in quantity; Presence-Only (PO) data, by contrast, offer broad spatial coverage and rich spatiotemporal distribution, but suffer from severe label noise in negative samples. To address these real-world constraints, this paper proposes a multimodal fusion framework that fully leverages the strengths of both PA and PO data. We introduce an innovative pseudo-label aggregation strategy for PO data based on the geographic coverage of satellite imagery, enabling geographic alignment between the label space and remote sensing feature space. In terms of model architecture, we adopt Swin Transformer Base as the backbone for satellite imagery, utilize the TabM network for tabular feature extraction, retain the Temporal Swin Transformer for time-series modeling, and employ a stackable serial tri-modal cross-attention mechanism to optimize the fusion of heterogeneous modalities. Furthermore, empirical analysis reveals significant geographic distribution shifts between PA training and test samples, and models trained by directly mixing PO and PA data tend to experience performance degradation due to label noise in PO data. To address this, we draw on the mixture-of-experts paradigm: test samples are partitioned according to their spatial proximity to PA samples, and different models trained on distinct datasets are used for inference and post-processing within each partition. Experiments on the GeoLifeCLEF 2025 dataset demonstrate that our approach achieves superior predictive performance in scenarios with limited PA coverage and pronounced distribution shifts.
Related papers
- Calibrated Probabilistic Interpolation for GEDI Biomass [0.0]
We introduce Attentive Neural Processes (ANPs), a probabilistic meta-learning framework that explicitly conditions predictions on local observation sets.<n>We validate this approach across five distinct biomes ranging from Tropical Amazonian forests to Boreal and Alpine ecosystems.<n>This work provides a scalable, theoretically rigorous alternative to ensemble variance for continental scale earth observation.
arXiv Detail & Related papers (2026-01-23T15:35:33Z) - Leveraging Optimal Transport for Distributed Two-Sample Testing: An Integrated Transportation Distance-based Framework [2.65043787671605]
This paper introduces a novel framework for distributed two-sample testing using the Integrated Transportation Distance (ITD)<n>A permutation test procedure is proposed for implementation in distributed settings, allowing for efficient computation while preserving data privacy.<n>Results indicate that ITD effectively aggregates information across distributed clients, detecting subtle distributional shifts that might be missed when examining individual clients.
arXiv Detail & Related papers (2025-06-19T05:59:48Z) - Spatial-Temporal-Spectral Unified Modeling for Remote Sensing Dense Prediction [20.1863553357121]
Current deep learning architectures for remote sensing are fundamentally rigid.<n>We introduce the Spatial-Temporal-Spectral Unified Network (STSUN) for unified modeling.<n> STSUN can adapt to input and output data with arbitrary spatial sizes, temporal lengths, and spectral bands.<n>It unifies various dense prediction tasks and diverse semantic class predictions.
arXiv Detail & Related papers (2025-05-18T07:39:17Z) - Un-mixing Test-time Adaptation under Heterogeneous Data Streams [21.40129321379529]
Test-Time Adaptation (TTA) has emerged as a promising solution for deep model adaptation.<n>We propose FreDA, a novel Frequency-based Decentralized Adaptation framework.
arXiv Detail & Related papers (2024-11-16T12:29:59Z) - Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Multi-Source and Test-Time Domain Adaptation on Multivariate Signals using Spatio-Temporal Monge Alignment [59.75420353684495]
Machine learning applications on signals such as computer vision or biomedical data often face challenges due to the variability that exists across hardware devices or session recordings.
In this work, we propose Spatio-Temporal Monge Alignment (STMA) to mitigate these variabilities.
We show that STMA leads to significant and consistent performance gains between datasets acquired with very different settings.
arXiv Detail & Related papers (2024-07-19T13:33:38Z) - A Self-Organizing Clustering System for Unsupervised Distribution Shift Detection [1.0436203990235575]
We propose a continual learning framework for monitoring and detecting distribution changes.
In particular, we investigate the projections made by two topology-preserving maps: the Self-Organizing Map and the Scale Invariant Map.
Our method can be applied in both a supervised and an unsupervised context.
arXiv Detail & Related papers (2024-04-25T14:48:29Z) - Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AI [29.13807697733638]
We build on the remarkable achievements in generative sampling of natural images.
We propose an innovative challenge, potentially overly ambitious, which involves generating samples that resemble images.
The statistical challenge lies in the small sample size, sometimes consisting of a few hundred subjects.
arXiv Detail & Related papers (2024-04-10T22:35:06Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Linking data separation, visual separation, and classifier performance
using pseudo-labeling by contrastive learning [125.99533416395765]
We argue that the performance of the final classifier depends on the data separation present in the latent space and visual separation present in the projection.
We demonstrate our results by the classification of five real-world challenging image datasets of human intestinal parasites with only 1% supervised samples.
arXiv Detail & Related papers (2023-02-06T10:01:38Z) - Breaking the Spurious Causality of Conditional Generation via Fairness
Intervention with Corrective Sampling [77.15766509677348]
Conditional generative models often inherit spurious correlations from the training dataset.
This can result in label-conditional distributions that are imbalanced with respect to another latent attribute.
We propose a general two-step strategy to mitigate this issue.
arXiv Detail & Related papers (2022-12-05T08:09:33Z) - PaDiM: a Patch Distribution Modeling Framework for Anomaly Detection and
Localization [64.39761523935613]
We present a new framework for Patch Distribution Modeling, PaDiM, to concurrently detect and localize anomalies in images.
PaDiM makes use of a pretrained convolutional neural network (CNN) for patch embedding.
It also exploits correlations between the different semantic levels of CNN to better localize anomalies.
arXiv Detail & Related papers (2020-11-17T17:29:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.