Hubness Reduction with Dual Bank Sinkhorn Normalization for Cross-Modal Retrieval
- URL: http://arxiv.org/abs/2508.02538v1
- Date: Mon, 04 Aug 2025 15:45:48 GMT
- Title: Hubness Reduction with Dual Bank Sinkhorn Normalization for Cross-Modal Retrieval
- Authors: Zhengxin Pan, Haishuai Wang, Fangyu Wu, Peng Zhang, Jiajun Bu,
- Abstract summary: Hubness is a phenomenon where a small number of targets frequently appear as nearest neighbors to numerous queries.<n>Despite several proposed methods to reduce hubness, their underlying mechanisms remain poorly understood.<n>We propose a probability-balancing framework for more effective hubness reduction.
- Score: 12.329352187335312
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The past decade has witnessed rapid advancements in cross-modal retrieval, with significant progress made in accurately measuring the similarity between cross-modal pairs. However, the persistent hubness problem, a phenomenon where a small number of targets frequently appear as nearest neighbors to numerous queries, continues to hinder the precision of similarity measurements. Despite several proposed methods to reduce hubness, their underlying mechanisms remain poorly understood. To bridge this gap, we analyze the widely-adopted Inverted Softmax approach and demonstrate its effectiveness in balancing target probabilities during retrieval. Building on these insights, we propose a probability-balancing framework for more effective hubness reduction. We contend that balancing target probabilities alone is inadequate and, therefore, extend the framework to balance both query and target probabilities by introducing Sinkhorn Normalization (SN). Notably, we extend SN to scenarios where the true query distribution is unknown, showing that current methods, which rely solely on a query bank to estimate target hubness, produce suboptimal results due to a significant distributional gap between the query bank and targets. To mitigate this issue, we introduce Dual Bank Sinkhorn Normalization (DBSN), incorporating a corresponding target bank alongside the query bank to narrow this distributional gap. Our comprehensive evaluation across various cross-modal retrieval tasks, including image-text retrieval, video-text retrieval, and audio-text retrieval, demonstrates consistent performance improvements, validating the effectiveness of both SN and DBSN. All codes are publicly available at https://github.com/ppanzx/DBSN.
Related papers
- Uncertainty-Masked Bernoulli Diffusion for Camouflaged Object Detection Refinement [24.522233459116354]
Camouflaged Object Detection (COD) presents inherent challenges due to subtle visual differences between targets and their backgrounds.<n>We propose the Uncertainty-Masked Bernoulli Diffusion (UMBD) model, the first generative refinement framework specifically designed for COD.<n>UMBD introduces an uncertainty-guided masking mechanism that selectively applies Bernoulli diffusion to residual regions with poor segmentation quality.
arXiv Detail & Related papers (2025-06-12T14:02:18Z) - NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval [15.409022911063241]
NeighborRetr is a novel method that balances the learning of hubs and adaptively adjusts the relations of various kinds of neighbors.<n>We show that NeighborRetr achieves state-of-the-art results on multiple cross-modal retrieval benchmarks.
arXiv Detail & Related papers (2025-03-13T16:33:55Z) - Evaluating the Security of Merkle Trees in the Internet of Things: An Analysis of Data Falsification Probabilities [27.541105686358378]
This paper develops a theoretical framework to calculate the probability of data falsification, taking into account various scenarios based on the length of the Merkle path and hash length.
Empirical experiments validate the theoretical models, exploring simulations with diverse hash lengths and Merkle path lengths.
The findings reveal a decrease in falsification probability with increasing hash length and an inverse relationship with longer Merkle paths.
arXiv Detail & Related papers (2024-04-18T11:24:12Z) - Trade-off between Bagging and Boosting for quantum
separability-entanglement classification [0.0]
The pros and cons of the proposed random under-sampling boost CHA (RUSBCHA) for the quantum separability problem are compared.
The outcomes suggest that RUSBCHA is an alternative to the BCHA approach.
arXiv Detail & Related papers (2024-01-22T15:29:35Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and
Gallery Banks [5.164924773752648]
Hubness is a phenomenon where a small number of gallery data points are frequently retrieved, resulting in a decline in retrieval performance.
We show the necessity of incorporating both the gallery and query data for addressing hubness as hubs always exhibit high similarity with gallery and query data.
We present extensive experimental results on diverse language-grounded benchmarks, including text-image, text-video, and text-audio.
arXiv Detail & Related papers (2023-10-17T22:10:17Z) - Mutual Wasserstein Discrepancy Minimization for Sequential
Recommendation [82.0801585843835]
We propose a novel self-supervised learning framework based on Mutual WasserStein discrepancy minimization MStein for the sequential recommendation.
We also propose a novel contrastive learning loss based on Wasserstein Discrepancy Measurement.
arXiv Detail & Related papers (2023-01-28T13:38:48Z) - Composed Image Retrieval with Text Feedback via Multi-grained
Uncertainty Regularization [73.04187954213471]
We introduce a unified learning approach to simultaneously modeling the coarse- and fine-grained retrieval.
The proposed method has achieved +4.03%, +3.38%, and +2.40% Recall@50 accuracy over a strong baseline.
arXiv Detail & Related papers (2022-11-14T14:25:40Z) - Distributionally Robust Bayesian Optimization with $\varphi$-divergences [45.48814080654241]
We consider robustness against data-shift in $varphi$-divergences, which subsumes many popular choices, such as the Total Variation, and the extant Kullback-Leibler divergence.
We show that the DRO-BO problem in this setting is equivalent to a finite-dimensional optimization problem which, even in the continuous context setting, can be easily implemented with provable sublinear regret bounds.
arXiv Detail & Related papers (2022-03-04T04:34:52Z) - Improved Branch and Bound for Neural Network Verification via Lagrangian
Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks.
We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB)
BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z) - Higher Performance Visual Tracking with Dual-Modal Localization [106.91097443275035]
Visual Object Tracking (VOT) has synchronous needs for both robustness and accuracy.
We propose a dual-modal framework for target localization, consisting of robust localization suppressingors via ONR and the accurate localization attending to the target center precisely via OFC.
arXiv Detail & Related papers (2021-03-18T08:47:56Z) - BSN++: Complementary Boundary Regressor with Scale-Balanced Relation
Modeling for Temporal Action Proposal Generation [85.13713217986738]
We present BSN++, a new framework which exploits complementary boundary regressor and relation modeling for temporal proposal generation.
Not surprisingly, the proposed BSN++ ranked 1st place in the CVPR19 - ActivityNet challenge leaderboard on temporal action localization task.
arXiv Detail & Related papers (2020-09-15T07:08:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.