Approximate Nearest Neighbor Negative Contrastive Learning for Dense
Text Retrieval
- URL: http://arxiv.org/abs/2007.00808v2
- Date: Tue, 20 Oct 2020 22:17:19 GMT
- Title: Approximate Nearest Neighbor Negative Contrastive Learning for Dense
Text Retrieval
- Authors: Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul
Bennett, Junaid Ahmed, Arnold Overwijk
- Abstract summary: This paper presents Approximate nearest neighbor Negative Contrastive Estimation (ANCE), a training mechanism that constructs negatives from an Approximate Nearest Neighbor (ANN) index of the corpus.
In our experiments, ANCE boosts the BERT-Siamese DR model to outperform all competitive dense and sparse retrieval baselines.
It nearly matches the accuracy of sparse-retrieval-and-BERT-reranking using dot-product in the ANCE-learned representation space and provides almost 100x speed-up.
- Score: 20.62375162628628
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conducting text retrieval in a dense learned representation space has many
intriguing advantages over sparse retrieval. Yet the effectiveness of dense
retrieval (DR) often requires combination with sparse retrieval. In this paper,
we identify that the main bottleneck is in the training mechanisms, where the
negative instances used in training are not representative of the irrelevant
documents in testing. This paper presents Approximate nearest neighbor Negative
Contrastive Estimation (ANCE), a training mechanism that constructs negatives
from an Approximate Nearest Neighbor (ANN) index of the corpus, which is
parallelly updated with the learning process to select more realistic negative
training instances. This fundamentally resolves the discrepancy between the
data distribution used in the training and testing of DR. In our experiments,
ANCE boosts the BERT-Siamese DR model to outperform all competitive dense and
sparse retrieval baselines. It nearly matches the accuracy of
sparse-retrieval-and-BERT-reranking using dot-product in the ANCE-learned
representation space and provides almost 100x speed-up.
Related papers
- Towards Competitive Search Relevance For Inference-Free Learned Sparse Retrievers [6.773411876899064]
inference-free sparse models lag far behind in terms of search relevance when compared to both sparse and dense siamese models.
We propose two different approaches for performance improvement. First, we introduce the IDF-aware FLOPS loss, which introduces Inverted Document Frequency (IDF) to the sparsification of representations.
We find that it mitigates the negative impact of the FLOPS regularization on search relevance, allowing the model to achieve a better balance between accuracy and efficiency.
arXiv Detail & Related papers (2024-11-07T03:46:43Z) - Mitigating the Impact of False Negatives in Dense Retrieval with
Contrastive Confidence Regularization [15.204113965411777]
We propose a novel contrastive confidence regularizer for Noise Contrastive Estimation (NCE) loss.
Our analysis shows that the regularizer helps dense retrieval models be more robust against false negatives with a theoretical guarantee.
arXiv Detail & Related papers (2023-12-30T08:01:57Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Unsupervised Dense Retrieval with Relevance-Aware Contrastive
Pre-Training [81.3781338418574]
We propose relevance-aware contrastive learning.
We consistently improve the SOTA unsupervised Contriever model on the BEIR and open-domain QA retrieval benchmarks.
Our method can not only beat BM25 after further pre-training on the target corpus but also serves as a good few-shot learner.
arXiv Detail & Related papers (2023-06-05T18:20:27Z) - Test-Time Distribution Normalization for Contrastively Learned
Vision-language Models [39.66329310098645]
One of the most representative approaches proposed recently known as CLIP has garnered widespread adoption due to its effectiveness.
This paper reveals that the common downstream practice of taking a dot product is only a zeroth-order approximation of the optimization goal, resulting in a loss of information during test-time.
We propose Distribution Normalization (DN), where we approximate the mean representation of a batch of test samples and use such a mean to represent what would be analogous to negative samples in the InfoNCE loss.
arXiv Detail & Related papers (2023-02-22T01:14:30Z) - Bridging the Training-Inference Gap for Dense Phrase Retrieval [104.4836127502683]
Building dense retrievers requires a series of standard procedures, including training and validating neural models.
In this paper, we explore how the gap between training and inference in dense retrieval can be reduced.
We propose an efficient way of validating dense retrievers using a small subset of the entire corpus.
arXiv Detail & Related papers (2022-10-25T00:53:06Z) - LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text
Retrieval [55.097573036580066]
Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models.
Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.
arXiv Detail & Related papers (2022-03-11T18:53:12Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON)
First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.