Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
- URL: http://arxiv.org/abs/2403.05105v1
- Date: Fri, 8 Mar 2024 07:09:30 GMT
- Title: Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
- Authors: Haochen Han, Qinghua Zheng, Guang Dai, Minnan Luo, Jingdong Wang
- Abstract summary: In real-world scenarios, massive multimodal data are harvested from the Internet, which inevitably contains Partially Mismatched Pairs (PMPs)
Previous efforts tend to mitigate this problem by estimating a soft correspondence to down-weight the contribution of PMPs.
We propose L2RM, a general framework based on Optimal Transport (OT) that learns to rematch mismatched pairs.
- Score: 49.07523607316323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collecting well-matched multimedia datasets is crucial for training
cross-modal retrieval models. However, in real-world scenarios, massive
multimodal data are harvested from the Internet, which inevitably contains
Partially Mismatched Pairs (PMPs). Undoubtedly, such semantical irrelevant data
will remarkably harm the cross-modal retrieval performance. Previous efforts
tend to mitigate this problem by estimating a soft correspondence to
down-weight the contribution of PMPs. In this paper, we aim to address this
challenge from a new perspective: the potential semantic similarity among
unpaired samples makes it possible to excavate useful knowledge from mismatched
pairs. To achieve this, we propose L2RM, a general framework based on Optimal
Transport (OT) that learns to rematch mismatched pairs. In detail, L2RM aims to
generate refined alignments by seeking a minimal-cost transport plan across
different modalities. To formalize the rematching idea in OT, first, we propose
a self-supervised cost function that automatically learns from explicit
similarity-cost mapping relation. Second, we present to model a partial OT
problem while restricting the transport among false positives to further boost
refined alignments. Extensive experiments on three benchmarks demonstrate our
L2RM significantly improves the robustness against PMPs for existing models.
The code is available at https://github.com/hhc1997/L2RM.
Related papers
- Robust Multimodal Learning via Representation Decoupling [6.7678581401558295]
Multimodal learning has attracted increasing attention due to its practicality.
Existing methods tend to address it by learning a common subspace representation for different modality combinations.
We propose a novel Decoupled Multimodal Representation Network (DMRNet) to assist robust multimodal learning.
arXiv Detail & Related papers (2024-07-05T12:09:33Z) - A Unified Optimal Transport Framework for Cross-Modal Retrieval with Noisy Labels [22.2715520667186]
Cross-modal retrieval (CMR) aims to establish interaction between different modalities.
This work proposes UOT-RCL, a Unified framework based on Optimal Transport (OT) for Robust Cross-modal Retrieval.
Experiments on three widely-used cross-modal retrieval datasets demonstrate that our UOT-RCL surpasses the state-of-the-art approaches.
arXiv Detail & Related papers (2024-03-20T10:34:40Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Mitigating Spurious Correlations in Multi-modal Models during
Fine-tuning [18.45898471459533]
Spurious correlations that degrade model generalization or lead the model to be right for the wrong reasons are one of the main robustness concerns for real-world deployments.
This paper proposes a novel approach to address spurious correlations during fine-tuning for a given domain of interest.
arXiv Detail & Related papers (2023-04-08T05:20:33Z) - BiCro: Noisy Correspondence Rectification for Multi-modality Data via
Bi-directional Cross-modal Similarity Consistency [66.8685113725007]
BiCro aims to estimate soft labels for noisy data pairs to reflect their true correspondence degree.
experiments on three popular cross-modal matching datasets demonstrate that BiCro significantly improves the noise-robustness of various matching models.
arXiv Detail & Related papers (2023-03-22T09:33:50Z) - FeDXL: Provable Federated Learning for Deep X-Risk Optimization [105.17383135458897]
We tackle a novel federated learning (FL) problem for optimizing a family of X-risks, to which no existing algorithms are applicable.
The challenges for designing an FL algorithm for X-risks lie in the non-decomability of the objective over multiple machines and the interdependency between different machines.
arXiv Detail & Related papers (2022-10-26T00:23:36Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Learning to Match Distributions for Domain Adaptation [116.14838935146004]
This paper proposes Learning to Match (L2M) to automatically learn the cross-domain distribution matching.
L2M reduces the inductive bias by using a meta-network to learn the distribution matching loss in a data-driven way.
Experiments on public datasets substantiate the superiority of L2M over SOTA methods.
arXiv Detail & Related papers (2020-07-17T03:26:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.