Isotropic Representation Can Improve Dense Retrieval
- URL: http://arxiv.org/abs/2209.00218v2
- Date: Mon, 31 Jul 2023 13:56:40 GMT
- Title: Isotropic Representation Can Improve Dense Retrieval
- Authors: Euna Jung, Jungwon Park, Jaekeol Choi, Sungyoon Kim, Wonjong Rhee
- Abstract summary: High-performing dense retrieval models evaluate representations of query and document using BERT.
BERT representations are known to follow an anisotropic distribution of a narrow cone shape.
In this work, we show that isotropic representation can achieve a generally improved performance.
- Score: 5.6435410094272696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent advancement in language representation modeling has broadly
affected the design of dense retrieval models. In particular, many of the
high-performing dense retrieval models evaluate representations of query and
document using BERT, and subsequently apply a cosine-similarity based scoring
to determine the relevance. BERT representations, however, are known to follow
an anisotropic distribution of a narrow cone shape and such an anisotropic
distribution can be undesirable for the cosine-similarity based scoring. In
this work, we first show that BERT-based DR also follows an anisotropic
distribution. To cope with the problem, we introduce unsupervised
post-processing methods of Normalizing Flow and whitening, and develop
token-wise method in addition to the sequence-wise method for applying the
post-processing methods to the representations of dense retrieval models. We
show that the proposed methods can effectively enhance the representations to
be isotropic, then we perform experiments with ColBERT and RepBERT to show that
the performance (NDCG at 10) of document re-ranking can be improved by
5.17\%$\sim$8.09\% for ColBERT and 6.88\%$\sim$22.81\% for RepBERT. To examine
the potential of isotropic representation for improving the robustness of DR
models, we investigate out-of-distribution tasks where the test dataset differs
from the training dataset. The results show that isotropic representation can
achieve a generally improved performance. For instance, when training dataset
is MS-MARCO and test dataset is Robust04, isotropy post-processing can improve
the baseline performance by up to 24.98\%. Furthermore, we show that an
isotropic model trained with an out-of-distribution dataset can even outperform
a baseline model trained with the in-distribution dataset.
Related papers
- Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
Generative retrieval reformulates retrieval as an autoregressive generation task, where large language models generate target documents directly from a query.<n>We systematically investigate training and inference scaling laws in generative retrieval, exploring how model size, training data scale, and inference-time compute jointly influence performance.
arXiv Detail & Related papers (2025-03-24T17:59:03Z) - Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations [2.992602379681373]
We show that multi-modal fine-tuning can achieve notable OoDD performance.
We propose a training objective that enhances cross-modal alignment by regularizing the distances between image and text embeddings of ID data.
arXiv Detail & Related papers (2025-03-24T16:00:21Z) - Sub-graph Based Diffusion Model for Link Prediction [43.15741675617231]
Denoising Diffusion Probabilistic Models (DDPMs) represent a contemporary class of generative models with exceptional qualities.
We build a novel generative model for link prediction using a dedicated design to decompose the likelihood estimation process via the Bayesian formula.
Our proposed method presents numerous advantages: (1) transferability across datasets without retraining, (2) promising generalization on limited training data, and (3) robustness against graph adversarial attacks.
arXiv Detail & Related papers (2024-09-13T02:23:55Z) - EntAugment: Entropy-Driven Adaptive Data Augmentation Framework for Image Classification [10.334396596691048]
We propose EntAugment, a tuning-free and adaptive DA framework.
It dynamically assesses and adjusts the augmentation magnitudes for each sample during training.
We also introduce a novel entropy regularization term, EntLoss, which complements the EntAugment approach.
arXiv Detail & Related papers (2024-09-10T07:42:47Z) - DiffPuter: Empowering Diffusion Models for Missing Data Imputation [56.48119008663155]
This paper introduces DiffPuter, a tailored diffusion model combined with the Expectation-Maximization (EM) algorithm for missing data imputation.<n>Our theoretical analysis shows that DiffPuter's training step corresponds to the maximum likelihood estimation of data density.<n>Our experiments show that DiffPuter achieves an average improvement of 6.94% in MAE and 4.78% in RMSE compared to the most competitive existing method.
arXiv Detail & Related papers (2024-05-31T08:35:56Z) - Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels.
Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z) - Unifying Invariance and Spuriousity for Graph Out-of-Distribution via
Probability of Necessity and Sufficiency [19.49531172542614]
We propose a unified framework to exploit the Probability of Necessity and Sufficiency to extract the Invariant Substructure (PNSIS)
Our model outperforms the state-of-the-art techniques on graph OOD on several benchmarks.
arXiv Detail & Related papers (2024-02-14T13:31:53Z) - SMaRt: Improving GANs with Score Matching Regularity [94.81046452865583]
Generative adversarial networks (GANs) usually struggle in learning from highly diverse data, whose underlying manifold is complex.
We show that score matching serves as a promising solution to this issue thanks to its capability of persistently pushing the generated data points towards the real data manifold.
We propose to improve the optimization of GANs with score matching regularity (SMaRt)
arXiv Detail & Related papers (2023-11-30T03:05:14Z) - Graph Out-of-Distribution Generalization with Controllable Data
Augmentation [51.17476258673232]
Graph Neural Network (GNN) has demonstrated extraordinary performance in classifying graph properties.
Due to the selection bias of training and testing data, distribution deviation is widespread.
We propose OOD calibration to measure the distribution deviation of virtual samples.
arXiv Detail & Related papers (2023-08-16T13:10:27Z) - Diverse Data Augmentation with Diffusions for Effective Test-time Prompt
Tuning [73.75282761503581]
We propose DiffTPT, which leverages pre-trained diffusion models to generate diverse and informative new data.
Our experiments on test datasets with distribution shifts and unseen categories demonstrate that DiffTPT improves the zero-shot accuracy by an average of 5.13%.
arXiv Detail & Related papers (2023-08-11T09:36:31Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Adaptive Graph-Based Feature Normalization for Facial Expression
Recognition [1.2246649738388389]
We propose an Adaptive Graph-based Feature Normalization (AGFN) method to protect Facial Expression Recognition models from data uncertainties.
Our method outperforms state-of-the-art works with accuracies of 91.84% and 91.11% on benchmark datasets.
arXiv Detail & Related papers (2022-07-22T14:57:56Z) - Learning by Erasing: Conditional Entropy based Transferable Out-Of-Distribution Detection [17.31471594748061]
Out-of-distribution (OOD) detection is essential to handle the distribution shifts between training and test scenarios.
Existing methods require retraining to capture the dataset-specific feature representation or data distribution.
We propose a deep generative models (DGM) based transferable OOD detection method, which is unnecessary to retrain on a new ID dataset.
arXiv Detail & Related papers (2022-04-23T10:19:58Z) - Bayesian Graph Contrastive Learning [55.36652660268726]
We propose a novel perspective of graph contrastive learning methods showing random augmentations leads to encoders.
Our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector.
We show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-12-15T01:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.