Dual-level Modality Debiasing Learning for Unsupervised Visible-Infrared Person Re-Identification
- URL: http://arxiv.org/abs/2512.03745v1
- Date: Wed, 03 Dec 2025 12:43:16 GMT
- Title: Dual-level Modality Debiasing Learning for Unsupervised Visible-Infrared Person Re-Identification
- Authors: Jiaze Li, Yan Lu, Bin Liu, Guojun Yin, Mang Ye,
- Abstract summary: We propose a Dual-level Modality Debiasing Learning framework that implements debiasing at both the model and optimization levels.<n>Experiments on benchmark datasets demonstrate that DMDL could enable modality-invariant feature learning and a more generalized model.
- Score: 59.59359638389348
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Two-stage learning pipeline has achieved promising results in unsupervised visible-infrared person re-identification (USL-VI-ReID). It first performs single-modality learning and then operates cross-modality learning to tackle the modality discrepancy. Although promising, this pipeline inevitably introduces modality bias: modality-specific cues learned in the single-modality training naturally propagate into the following cross-modality learning, impairing identity discrimination and generalization. To address this issue, we propose a Dual-level Modality Debiasing Learning (DMDL) framework that implements debiasing at both the model and optimization levels. At the model level, we propose a Causality-inspired Adjustment Intervention (CAI) module that replaces likelihood-based modeling with causal modeling, preventing modality-induced spurious patterns from being introduced, leading to a low-biased model. At the optimization level, a Collaborative Bias-free Training (CBT) strategy is introduced to interrupt the propagation of modality bias across data, labels, and features by integrating modality-specific augmentation, label refinement, and feature alignment. Extensive experiments on benchmark datasets demonstrate that DMDL could enable modality-invariant feature learning and a more generalized model.
Related papers
- Enhancing Foundation VLM Robustness to Missing Modality: Scalable Diffusion for Bi-directional Feature Restoration [40.720288165545476]
We introduce an enhanced diffusion model as a pluggable mid-stage training module to effectively restore missing features.<n>Our strategy introduces two key innovations: (I) Dynamic Modality Gating, which adaptively leverages conditional features to steer the generation of semantically consistent features; (II) Cross-Modal Mutual Learning mechanism, which bridges the semantic spaces of dual encoders to achieve bidirectional alignment.
arXiv Detail & Related papers (2026-02-03T06:06:35Z) - DIS2: Disentanglement Meets Distillation with Classwise Attention for Robust Remote Sensing Segmentation under Missing Modalities [28.992992584085787]
DIS2 is a new paradigm shifting from modality-shared feature dependence to active, guided missing features compensation.<n> Compensatory features are explicitly captured which, when fused with the features of the available modality, approximate the ideal fused representation of the full-modality case.<n>Our proposed approach significantly outperforms state-of-the-art methods across benchmarks.
arXiv Detail & Related papers (2026-01-20T01:33:54Z) - Did Models Sufficient Learn? Attribution-Guided Training via Subset-Selected Counterfactual Augmentation [61.248535801314375]
Subset-Selected Counterfactual Augmentation (SS-CA)<n>We develop Counterfactual LIMA to identify minimal spatial region sets whose removal can selectively alter model predictions.<n>Experiments show that SS-CA improves generalization on in-distribution (ID) test data and achieves superior performance on out-of-distribution (OOD) benchmarks.
arXiv Detail & Related papers (2025-11-15T08:39:22Z) - DAM: Dual Active Learning with Multimodal Foundation Model for Source-Free Domain Adaptation [53.323488295994395]
Source-free active domain adaptation (SFADA) enhances knowledge transfer from a source model to an unlabeled target domain using limited manual labels selected via active learning.<n>We propose Dual Active learning with Multimodal (DAM) foundation model, a novel framework that integrates multimodal supervision from a ViL model to complement sparse human annotations.<n>Extensive experiments demonstrate that DAM consistently outperforms existing methods and sets a new state-of-the-art across multiple SFADA benchmarks and active learning strategies.
arXiv Detail & Related papers (2025-09-29T15:06:56Z) - MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings [75.0617088717528]
MoCa is a framework for transforming pre-trained VLM backbones into effective bidirectional embedding models.<n>MoCa consistently improves performance across MMEB and ViDoRe-v2 benchmarks, achieving new state-of-the-art results.
arXiv Detail & Related papers (2025-06-29T06:41:00Z) - Beyond Templates: Dynamic Adaptation of Reasoning Demonstrations via Feasibility-Aware Exploration [15.711365331854614]
We introduce Dynamic Adaptation of Reasoning Trajectories (DART), a novel data adaptation framework.<n>Instead of uniformly imitating expert steps, DART employs a selective imitation strategy guided by step-wise adaptability estimation.<n>We validate DART across multiple reasoning benchmarks and model scales, demonstrating that it significantly improves generalization and data efficiency.
arXiv Detail & Related papers (2025-05-27T04:08:11Z) - Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models [42.17166746027585]
We introduce a bidirectional weighted graph-based framework to learn factorized attributes and their interrelations within complex data.
Specifically, we propose a $beta$-VAE based module to extract factors as the initial nodes of the graph.
By integrating these complementary modules, our model successfully achieves fine-grained, practical and unsupervised disentanglement.
arXiv Detail & Related papers (2024-07-26T15:32:21Z) - Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation [11.195959019678314]
Consistency learning is a central strategy to tackle unlabeled data in semi-supervised medical image segmentation.
In this paper, we propose an Adaptive Bidirectional Displacement approach to solve the above challenge.
arXiv Detail & Related papers (2024-05-01T08:17:43Z) - Robust Training of Federated Models with Extremely Label Deficiency [84.00832527512148]
Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency.
We propose a novel twin-model paradigm, called Twin-sight, designed to enhance mutual guidance by providing insights from different perspectives of labeled and unlabeled data.
Our comprehensive experiments on four benchmark datasets provide substantial evidence that Twin-sight can significantly outperform state-of-the-art methods across various experimental settings.
arXiv Detail & Related papers (2024-02-22T10:19:34Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.