Adaptive Gradient Calibration for Single-Positive Multi-Label Learning in Remote Sensing Image Scene Classification
- URL: http://arxiv.org/abs/2510.08269v1
- Date: Thu, 09 Oct 2025 14:26:09 GMT
- Title: Adaptive Gradient Calibration for Single-Positive Multi-Label Learning in Remote Sensing Image Scene Classification
- Authors: Chenying Liu, Gianmarco Perantoni, Lorenzo Bruzzone, Xiao Xiang Zhu,
- Abstract summary: Multi-label classification (MLC) offers a more comprehensive semantic understanding of Remote Sensing (RS) imagery.<n>Single-positive multi-label learning (SPML) has emerged, where each image is annotated with only one relevant label, and the model is expected to recover the full set of labels.<n>We propose Adaptive Gradient (AdaGC) as a novel and generalizable SPML framework tailored to RS imagery.
- Score: 20.29420915336209
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-label classification (MLC) offers a more comprehensive semantic understanding of Remote Sensing (RS) imagery compared to traditional single-label classification (SLC). However, obtaining complete annotations for MLC is particularly challenging due to the complexity and high cost of the labeling process. As a practical alternative, single-positive multi-label learning (SPML) has emerged, where each image is annotated with only one relevant label, and the model is expected to recover the full set of labels. While scalable, SPML introduces significant supervision ambiguity, demanding specialized solutions for model training. Although various SPML methods have been proposed in the computer vision domain, research in the RS context remains limited. To bridge this gap, we propose Adaptive Gradient Calibration (AdaGC), a novel and generalizable SPML framework tailored to RS imagery. AdaGC adopts a gradient calibration (GC) mechanism combined with Mixup and a dual exponential moving average (EMA) module for robust pseudo-label generation. To maximize AdaGC's effectiveness, we introduce a simple yet theoretically grounded indicator to adaptively trigger GC after an initial warm-up stage based on training dynamics, thereby guaranteeing the effectiveness of GC in mitigating overfitting to label noise. Extensive experiments on two benchmark RS datasets under two distinct label noise types demonstrate that AdaGC achieves state-of-the-art (SOTA) performance while maintaining strong robustness across diverse settings.
Related papers
- MacNet: An End-to-End Manifold-Constrained Adaptive Clustering Network for Interpretable Whole Slide Image Classification [9.952997875404634]
Clustering-based approaches can provide explainable decision-making process but suffer from high dimension features and semantically ambiguous centroids.<n>We propose an end-to-end MIL framework that integrates Grassmann re-embedding and manifold adaptive clustering.<n> Experiments on multicentre WSI datasets demonstrate that: 1) our cluster-incorporated model achieves superior performance in both grading accuracy and interpretability; 2) end-to-end learning refines better feature representations and it requires acceptable resources.
arXiv Detail & Related papers (2026-02-16T06:43:36Z) - Semi-Supervised Hyperspectral Image Classification with Edge-Aware Superpixel Label Propagation and Adaptive Pseudo-Labeling [5.022329161015679]
We propose a novel semi-supervised hyperspectral classification framework integrating spatial prior information with a dynamic learning mechanism.<n>We introduce a Dynamic History-Fused Prediction (DHP) method to smoothens pseudo-label fluctuations and improves temporal consistency and noise resistance.<n>The Dynamic Reliability-Enhanced Pseudo-Label Framework (DREPL) strengthens pseudo-label stability across temporal and sample domains.
arXiv Detail & Related papers (2026-01-26T00:31:08Z) - Towards Fine-Grained Vision-Language Alignment for Few-Shot Anomaly Detection [65.29550320117526]
We propose a novel framework named FineGrainedAD to improve anomaly localization performance.<n> Experiments demonstrate that the proposed FineGrainedAD achieves superior overall performance in few-shot settings.
arXiv Detail & Related papers (2025-10-30T13:09:00Z) - Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech [51.14752758616364]
Speech-based depression detection (SDD) is a promising, non-invasive alternative to traditional clinical assessments.<n>We propose HAREN-CTC, a novel architecture that integrates multi-layer SSL features using cross-attention within a multitask learning framework.<n>The model achieves state-of-the-art macro F1-scores of 0.81 on DAIC-WOZ and 0.82 on MODMA, outperforming prior methods across both evaluation scenarios.
arXiv Detail & Related papers (2025-10-05T09:32:12Z) - HyperTTA: Test-Time Adaptation for Hyperspectral Image Classification under Distribution Shifts [28.21559601586271]
HyperTTA (Test-Time Adaptable Transformer for Hyperspectral Degradation) is a unified framework that enhances model robustness under diverse degradation conditions.<n>Test-time adaptation strategy, the Confidence-aware Entropy-minimized LayerNorm Adapter (CELA), dynamically updates only the affine parameters of LayerNorm layers.<n>Experiments on two benchmark datasets demonstrate that HyperTTA outperforms state-of-the-art baselines across a wide range of degradation scenarios.
arXiv Detail & Related papers (2025-09-10T09:31:37Z) - AHDMIL: Asymmetric Hierarchical Distillation Multi-Instance Learning for Fast and Accurate Whole-Slide Image Classification [51.525891360380285]
AHDMIL is an Asymmetric Hierarchical Distillation Multi-Instance Learning framework.<n>It eliminates irrelevant patches through a two-step training process.<n>It consistently outperforms previous state-of-the-art methods in both classification performance and inference speed.
arXiv Detail & Related papers (2025-08-07T07:47:16Z) - Unbiased Max-Min Embedding Classification for Transductive Few-Shot Learning: Clustering and Classification Are All You Need [83.10178754323955]
Few-shot learning enables models to generalize from only a few labeled examples.<n>We propose the Unbiased Max-Min Embedding Classification (UMMEC) Method, which addresses the key challenges in few-shot learning.<n>Our method significantly improves classification performance with minimal labeled data, advancing the state-of-the-art in annotatedL.
arXiv Detail & Related papers (2025-03-28T07:23:07Z) - Transformer-Driven Active Transfer Learning for Cross-Hyperspectral Image Classification [3.087068801861429]
Hyperspectral image (HSI) classification presents inherent challenges due to high spectral dimensionality, significant domain shifts, and limited availability of labeled data.<n>We propose a novel Active Transfer Learning (ATL) framework built upon a Spatial-Spectral Transformer (SST) backbone.<n>The framework integrates multistage transfer learning with an uncertainty-diversity-driven active learning mechanism.
arXiv Detail & Related papers (2024-11-27T07:53:39Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - Slot-Mixup with Subsampling: A Simple Regularization for WSI
Classification [13.286360560353936]
Whole slide image (WSI) classification requires repetitive zoom-in and out for pathologists, as only small portions of the slide may be relevant to detecting cancer.
Due to the lack of patch-level labels, multiple instance learning (MIL) is a common practice for training a WSI classifier.
One of the challenges in MIL for WSIs is the weak supervision coming only from the slide-level labels, often resulting in severe overfitting.
Our approach augments the training dataset by sampling a subset of patches in the WSI without significantly altering the underlying semantics of the original slides.
arXiv Detail & Related papers (2023-11-29T09:18:39Z) - Deep Learning From Crowdsourced Labels: Coupled Cross-entropy
Minimization, Identifiability, and Regularization [21.32957828106532]
A deep learning-based end-to-end (E2E) system uses noisy crowdsourced labels from multiple annotators.
Many E2E systems co-train the neural classifier with multiple annotator-specific label confusion'' layers.
This work presents performance guarantees of the CCEM criterion and two regularized variants of the CCEM.
arXiv Detail & Related papers (2023-06-05T22:21:26Z) - Multi-Granularity Denoising and Bidirectional Alignment for Weakly
Supervised Semantic Segmentation [75.32213865436442]
We propose an end-to-end multi-granularity denoising and bidirectional alignment (MDBA) model to alleviate the noisy label and multi-class generalization issues.
The MDBA model can reach the mIoU of 69.5% and 70.2% on validation and test sets for the PASCAL VOC 2012 dataset.
arXiv Detail & Related papers (2023-05-09T03:33:43Z) - Learning in Imperfect Environment: Multi-Label Classification with
Long-Tailed Distribution and Partial Labels [53.68653940062605]
We introduce a novel task, Partial labeling and Long-Tailed Multi-Label Classification (PLT-MLC)
We find that most LT-MLC and PL-MLC approaches fail to solve the degradation-MLC.
We propose an end-to-end learning framework: textbfCOrrection $rightarrow$ textbfModificattextbfIon $rightarrow$ balantextbfCe.
arXiv Detail & Related papers (2023-04-20T20:05:08Z) - Open Vocabulary Multi-Label Classification with Dual-Modal Decoder on
Aligned Visual-Textual Features [14.334304670606633]
We propose a novel algorithm, Aligned Dual moDality ClaSsifier (ADDS), which includes a Dual-Modal decoder (DM-decoder) with alignment between visual and textual features.
Extensive experiments conducted on several standard benchmarks, NUS-WIDE, ImageNet-1k, ImageNet-21k, and MS-COCO, demonstrate that our approach significantly outperforms previous methods.
arXiv Detail & Related papers (2022-08-19T22:45:07Z) - Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and
Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS)
SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels.
Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.