NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval
- URL: http://arxiv.org/abs/2503.10526v1
- Date: Thu, 13 Mar 2025 16:33:55 GMT
- Title: NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval
- Authors: Zengrong Lin, Zheng Wang, Tianwen Qian, Pan Mu, Sixian Chan, Cong Bai,
- Abstract summary: NeighborRetr is a novel method that balances the learning of hubs and adaptively adjusts the relations of various kinds of neighbors.<n>We show that NeighborRetr achieves state-of-the-art results on multiple cross-modal retrieval benchmarks.
- Score: 15.409022911063241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-modal retrieval aims to bridge the semantic gap between different modalities, such as visual and textual data, enabling accurate retrieval across them. Despite significant advancements with models like CLIP that align cross-modal representations, a persistent challenge remains: the hubness problem, where a small subset of samples (hubs) dominate as nearest neighbors, leading to biased representations and degraded retrieval accuracy. Existing methods often mitigate hubness through post-hoc normalization techniques, relying on prior data distributions that may not be practical in real-world scenarios. In this paper, we directly mitigate hubness during training and introduce NeighborRetr, a novel method that effectively balances the learning of hubs and adaptively adjusts the relations of various kinds of neighbors. Our approach not only mitigates the hubness problem but also enhances retrieval performance, achieving state-of-the-art results on multiple cross-modal retrieval benchmarks. Furthermore, NeighborRetr demonstrates robust generalization to new domains with substantial distribution shifts, highlighting its effectiveness in real-world applications. We make our code publicly available at: https://github.com/zzezze/NeighborRetr .
Related papers
- RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval [32.06421737874828]
Reconstruction Relations Embedded Hashing (RREH) is designed for semi-paired cross-modal retrieval tasks.
RREH assumes that multi-modal data share a common subspace.
anchors are sampled from paired data, which improves the efficiency of hash learning.
arXiv Detail & Related papers (2024-05-28T03:12:54Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Learnable Pillar-based Re-ranking for Image-Text Retrieval [119.9979224297237]
Image-text retrieval aims to bridge the modality gap and retrieve cross-modal content based on semantic similarities.
Re-ranking, a popular post-processing practice, has revealed the superiority of capturing neighbor relations in single-modality retrieval tasks.
We propose a novel learnable pillar-based re-ranking paradigm for image-text retrieval.
arXiv Detail & Related papers (2023-04-25T04:33:27Z) - Far Away in the Deep Space: Dense Nearest-Neighbor-Based
Out-of-Distribution Detection [33.78080060234557]
Nearest-Neighbors approaches have been shown to work well in object-centric data domains.
We show that nearest-neighbor approaches also yield state-of-the-art results on dense novelty detection in complex driving scenes.
arXiv Detail & Related papers (2022-11-12T13:32:19Z) - Prototype-Based Layered Federated Cross-Modal Hashing [14.844848099134648]
In this paper, we propose a novel method called prototype-based layered federated cross-modal hashing.
Specifically, the prototype is introduced to learn the similarity between instances and classes on server.
To realize personalized federated learning, a hypernetwork is deployed on server to dynamically update different layers' weights of local model.
arXiv Detail & Related papers (2022-10-27T15:11:12Z) - Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation.
We develop a new adversarial learning based method, which is simple and efficient to apply.
We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z) - Multi-Modal Mutual Information Maximization: A Novel Approach for
Unsupervised Deep Cross-Modal Hashing [73.29587731448345]
We propose a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH)
We learn informative representations that can preserve both intra- and inter-modal similarities.
The proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
arXiv Detail & Related papers (2021-12-13T08:58:03Z) - Learning Robust Representation for Clustering through Locality
Preserving Variational Discriminative Network [16.259673823482665]
Variational Deep Embedding achieves great success in various clustering tasks.
VaDE suffers from two problems: 1) it is fragile to the input noise; 2) it ignores the locality information between the neighboring data points.
We propose a joint learning framework that improves VaDE with a robust embedding discriminator and a local structure constraint.
arXiv Detail & Related papers (2020-12-25T02:31:55Z) - Cross-Domain Generalization Through Memorization: A Study of Nearest
Neighbors in Neural Duplicate Question Detection [72.01292864036087]
Duplicate question detection (DQD) is important to increase efficiency of community and automatic question answering systems.
We leverage neural representations and study nearest neighbors for cross-domain generalization in DQD.
We observe robust performance of this method in different cross-domain scenarios of StackExchange, Spring and Quora datasets.
arXiv Detail & Related papers (2020-11-22T19:19:33Z) - CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON)
First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z) - PushNet: Efficient and Adaptive Neural Message Passing [1.9121961872220468]
Message passing neural networks have recently evolved into a state-of-the-art approach to representation learning on graphs.
Existing methods perform synchronous message passing along all edges in multiple subsequent rounds.
We consider a novel asynchronous message passing approach where information is pushed only along the most relevant edges until convergence.
arXiv Detail & Related papers (2020-03-04T18:15:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.