REPAIR: Rank Correlation and Noisy Pair Half-replacing with Memory for
Noisy Correspondence
- URL: http://arxiv.org/abs/2403.08224v1
- Date: Wed, 13 Mar 2024 04:01:20 GMT
- Title: REPAIR: Rank Correlation and Noisy Pair Half-replacing with Memory for
Noisy Correspondence
- Authors: Ruochen Zheng, Jiahao Hong, Changxin Gao, Nong Sang
- Abstract summary: The presence of noise in acquired data invariably leads to performance degradation in crossmodal matching.
We propose a framework as Rank corrElation and noisy hAlf wIth memoRy to tackle the mismatched data pair issue.
- Score: 36.274879585424635
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The presence of noise in acquired data invariably leads to performance
degradation in cross-modal matching. Unfortunately, obtaining precise
annotations in the multimodal field is expensive, which has prompted some
methods to tackle the mismatched data pair issue in cross-modal matching
contexts, termed as noisy correspondence. However, most of these existing noisy
correspondence methods exhibit the following limitations: a) the problem of
self-reinforcing error accumulation, and b) improper handling of noisy data
pair. To tackle the two problems, we propose a generalized framework termed as
Rank corrElation and noisy Pair hAlf-replacing wIth memoRy (REPAIR), which
benefits from maintaining a memory bank for features of matched pairs.
Specifically, we calculate the distances between the features in the memory
bank and those of the target pair for each respective modality, and use the
rank correlation of these two sets of distances to estimate the soft
correspondence label of the target pair. Estimating soft correspondence based
on memory bank features rather than using a similarity network can avoid the
accumulation of errors due to incorrect network identifications. For pairs that
are completely mismatched, REPAIR searches the memory bank for the most
matching feature to replace one feature of one modality, instead of using the
original pair directly or merely discarding the mismatched pair. We conduct
experiments on three cross-modal datasets, i.e., Flickr30K, MSCOCO, and CC152K,
proving the effectiveness and robustness of our REPAIR on synthetic and
real-world noise.
Related papers
- NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval [16.460121977322224]
Composed Image Retrieval (CIR) seeks to find a target image using a multi-modal query, which combines an image with modification text to pinpoint the target.
pairs are often partially or completely mismatched due to issues like inaccurate modification texts, low-quality target images, and annotation errors.
We propose the Noise-aware Contrastive Learning for CIR (NCL-CIR) comprising two key components: the Weight Compensation Block (WCB) and the Noise-pair Filter Block (NFB).
arXiv Detail & Related papers (2025-04-06T03:27:23Z) - ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning [42.292388966066135]
We propose a general Relation Consistency learning framework, namely ReCon, to accurately discriminate the true correspondences among the multimodal data.
ReCon significantly enhances its effectiveness for true correspondence discrimination and therefore reliably filters out the mismatched pairs.
arXiv Detail & Related papers (2025-02-27T10:38:03Z) - Disentangled Noisy Correspondence Learning [56.06801962154915]
Cross-modal retrieval is crucial in understanding latent correspondences across modalities.
DisNCL is a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning.
arXiv Detail & Related papers (2024-08-10T09:49:55Z) - Diff-Reg v1: Diffusion Matching Model for Registration Problem [34.57825794576445]
Existing methods commonly leverage geometric or semantic point features to generate potential correspondences.
Previous methods, which rely on single-pass prediction, may struggle with local minima in complex scenarios.
We introduce a diffusion matching model for robust correspondence estimation.
arXiv Detail & Related papers (2024-03-29T02:10:38Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Negative Pre-aware for Noisy Cross-modal Matching [46.5591267410225]
Cross-modal noise-robust learning is a challenging task since noisy correspondence is hard to recognize and rectify.
We present a novel Negative Pre-aware Cross-modal matching solution for large visual-language model fine-tuning on noisy downstream tasks.
arXiv Detail & Related papers (2023-12-10T05:52:36Z) - Noisy Pair Corrector for Dense Retrieval [59.312376423104055]
We propose a novel approach called Noisy Pair Corrector (NPC)
NPC consists of a detection module and a correction module.
We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
arXiv Detail & Related papers (2023-11-07T08:27:14Z) - BiCro: Noisy Correspondence Rectification for Multi-modality Data via
Bi-directional Cross-modal Similarity Consistency [66.8685113725007]
BiCro aims to estimate soft labels for noisy data pairs to reflect their true correspondence degree.
experiments on three popular cross-modal matching datasets demonstrate that BiCro significantly improves the noise-robustness of various matching models.
arXiv Detail & Related papers (2023-03-22T09:33:50Z) - FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced
Context-Aware Network [48.912196729711624]
Few-shot semantic segmentation is the task of learning to locate each pixel of a novel class in a query image with only a few annotated support images.
We propose a Feature-Enhanced Context-Aware Network (FECANet) to suppress the matching noise caused by inter-class local similarity.
In addition, we propose a novel correlation reconstruction module that encodes extra correspondence relations between foreground and background and multi-scale context semantic features.
arXiv Detail & Related papers (2023-01-19T16:31:13Z) - Noise-Tolerant Learning for Audio-Visual Action Recognition [31.641972732424463]
Video datasets are usually coarse-annotated or collected from the Internet.
We propose a noise-tolerant learning framework to find anti-interference model parameters against both noisy labels and noisy correspondence.
Our method significantly improves the robustness of the action recognition model and surpasses the baselines by a clear margin.
arXiv Detail & Related papers (2022-05-16T12:14:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.