Related papers: Semantic-Consistent Bidirectional Contrastive Hashing for Noisy Multi-Label Cross-Modal Retrieval

Semantic-Consistent Bidirectional Contrastive Hashing for Noisy Multi-Label Cross-Modal Retrieval

URL: http://arxiv.org/abs/2511.07780v1
Date: Wed, 12 Nov 2025 01:17:46 GMT
Title: Semantic-Consistent Bidirectional Contrastive Hashing for Noisy Multi-Label Cross-Modal Retrieval
Authors: Likang Peng, Chao Su, Wenyuan Wu, Yuan Sun, Dezhong Peng, Xi Peng, Xu Wang,
Abstract summary: Cross-modal hashing (CMH) facilitates efficient retrieval across different modalities.<n>In real-world scenarios, label noise is prevalent and severely degrades retrieval performance.<n>We propose a novel framework named Semantic-Consistent Bidirectional Contrastive Hashing (SCBCH)
Score: 37.4688414628963
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-modal hashing (CMH) facilitates efficient retrieval across different modalities (e.g., image and text) by encoding data into compact binary representations. While recent methods have achieved remarkable performance, they often rely heavily on fully annotated datasets, which are costly and labor-intensive to obtain. In real-world scenarios, particularly in multi-label datasets, label noise is prevalent and severely degrades retrieval performance. Moreover, existing CMH approaches typically overlook the partial semantic overlaps inherent in multi-label data, limiting their robustness and generalization. To tackle these challenges, we propose a novel framework named Semantic-Consistent Bidirectional Contrastive Hashing (SCBCH). The framework comprises two complementary modules: (1) Cross-modal Semantic-Consistent Classification (CSCC), which leverages cross-modal semantic consistency to estimate sample reliability and reduce the impact of noisy labels; (2) Bidirectional Soft Contrastive Hashing (BSCH), which dynamically generates soft contrastive sample pairs based on multi-label semantic overlap, enabling adaptive contrastive learning between semantically similar and dissimilar samples across modalities. Extensive experiments on four widely-used cross-modal retrieval benchmarks validate the effectiveness and robustness of our method, consistently outperforming state-of-the-art approaches under noisy multi-label conditions.

Related papers

Test-time Adaptive Hierarchical Co-enhanced Denoising Network for Reliable Multimodal Classification [55.56234913868664]
We propose Test-time Adaptive Hierarchical Co-enhanced Denoising Network (TAHCD) for reliable learning on multimodal data.<n>The proposed method achieves superior classification performance, robustness, and generalization compared with state-of-the-art reliable multimodal learning approaches.
arXiv Detail & Related papers (2026-01-12T03:14:12Z)
Neighbor-aware Instance Refining with Noisy Labels for Cross-Modal Retrieval [12.062625455647265]
Cross-Modal Retrieval (CMR) has made significant progress in the field of multi-modal analysis.<n> CMR methods often fail to simultaneously satisfy model performance ceilings, calibration reliability, and data utilization rate.<n>We propose a novel robust cross-modal learning framework, namely Neighbor-aware Instance Refining with Noisy Labels (NIRNL)
arXiv Detail & Related papers (2025-12-30T08:19:07Z)
PCSR: Pseudo-label Consistency-Guided Sample Refinement for Noisy Correspondence Learning [17.302186298424836]
Cross-modal retrieval aims to align different modalities via semantic similarity.<n>Existing methods often assume that image-text pairs are perfectly aligned, overlooking Noisy Correspondences in real data.
arXiv Detail & Related papers (2025-09-19T05:41:17Z)
SemSim: Revisiting Weak-to-Strong Consistency from a Semantic Similarity Perspective for Semi-supervised Medical Image Segmentation [18.223854197580145]
Semi-supervised learning (SSL) for medical image segmentation is a challenging yet highly practical task. We propose a novel framework based on FixMatch, named SemSim, powered by two appealing designs from semantic similarity perspective. We show that SemSim yields consistent improvements over the state-of-the-art methods across three public segmentation benchmarks.
arXiv Detail & Related papers (2024-10-17T12:31:37Z)
Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning [81.83013974171364]
Semi-supervised multi-label learning (SSMLL) is a powerful framework for leveraging unlabeled data to reduce the expensive cost of collecting precise multi-label annotations.<n>Unlike semi-supervised learning, one cannot select the most probable label as the pseudo-label in SSMLL due to multiple semantics contained in an instance.<n>We propose a dual-perspective method to generate high-quality pseudo-labels.
arXiv Detail & Related papers (2024-07-26T09:33:53Z)
CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation [18.276988929148143]
We explore the potential of large-scale noisily labeled data to enhance feature learning by pretraining semantic segmentation models.<n>Unlike conventional pretraining approaches, CromSS exploits massive amounts of noisy and easy-to-come-by labels for improved feature learning.
arXiv Detail & Related papers (2024-05-02T11:58:06Z)
A Unified Optimal Transport Framework for Cross-Modal Retrieval with Noisy Labels [22.2715520667186]
Cross-modal retrieval (CMR) aims to establish interaction between different modalities. This work proposes UOT-RCL, a Unified framework based on Optimal Transport (OT) for Robust Cross-modal Retrieval. Experiments on three widely-used cross-modal retrieval datasets demonstrate that our UOT-RCL surpasses the state-of-the-art approaches.
arXiv Detail & Related papers (2024-03-20T10:34:40Z)
Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID [56.573905143954015]
We propose a novel bilateral cluster matching-based learning framework to reduce the modality gap by matching cross-modality clusters. Under such a supervisory signal, a Modality-Specific and Modality-Agnostic (MSMA) contrastive learning framework is proposed to align features jointly at a cluster-level. Experiments on the public SYSU-MM01 and RegDB datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2023-05-22T03:27:46Z)
BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency [66.8685113725007]
BiCro aims to estimate soft labels for noisy data pairs to reflect their true correspondence degree. experiments on three popular cross-modal matching datasets demonstrate that BiCro significantly improves the noise-robustness of various matching models.
arXiv Detail & Related papers (2023-03-22T09:33:50Z)
Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling. This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data. We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z)
Rank-Consistency Deep Hashing for Scalable Multi-Label Image Search [90.30623718137244]
We propose a novel deep hashing method for scalable multi-label image search. A new rank-consistency objective is applied to align the similarity orders from two spaces. A powerful loss function is designed to penalize the samples whose semantic similarity and hamming distance are mismatched.
arXiv Detail & Related papers (2021-02-02T13:46:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.