Efficiently Teaching an Effective Dense Retriever with Balanced Topic
Aware Sampling
- URL: http://arxiv.org/abs/2104.06967v1
- Date: Wed, 14 Apr 2021 16:49:18 GMT
- Title: Efficiently Teaching an Effective Dense Retriever with Balanced Topic
Aware Sampling
- Authors: Sebastian Hofst\"atter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin,
Allan Hanbury
- Abstract summary: TAS-Balanced is an efficient topic-aware query and balanced margin sampling technique.
We show that our TAS-Balanced training method achieves state-of-the-art low-latency (64ms per query) results on two TREC Deep Learning Track query sets.
- Score: 37.01593605084575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A vital step towards the widespread adoption of neural retrieval models is
their resource efficiency throughout the training, indexing and query
workflows. The neural IR community made great advancements in training
effective dual-encoder dense retrieval (DR) models recently. A dense text
retrieval model uses a single vector representation per query and passage to
score a match, which enables low-latency first stage retrieval with a nearest
neighbor search. Increasingly common, training approaches require enormous
compute power, as they either conduct negative passage sampling out of a
continuously updating refreshing index or require very large batch sizes for
in-batch negative sampling. Instead of relying on more compute capability, we
introduce an efficient topic-aware query and balanced margin sampling
technique, called TAS-Balanced. We cluster queries once before training and
sample queries out of a cluster per batch. We train our lightweight 6-layer DR
model with a novel dual-teacher supervision that combines pairwise and in-batch
negative teachers. Our method is trainable on a single consumer-grade GPU in
under 48 hours (as opposed to a common configuration of 8x V100s). We show that
our TAS-Balanced training method achieves state-of-the-art low-latency (64ms
per query) results on two TREC Deep Learning Track query sets. Evaluated on
NDCG@10, we outperform BM25 by 44%, a plainly trained DR by 19%, docT5query by
11%, and the previous best DR model by 5%. Additionally, TAS-Balanced produces
the first dense retriever that outperforms every other method on recall at any
cutoff on TREC-DL and allows more resource intensive re-ranking models to
operate on fewer passages to improve results further.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning [50.809769498312434]
We propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS)
Our method achieves 54.51% accuracy with only 10% training data, surpassing random selection by 7.83% and other comparison methods by at least 12.69%.
arXiv Detail & Related papers (2023-11-22T03:45:30Z) - HYRR: Hybrid Infused Reranking for Passage Retrieval [18.537666294601458]
Hybrid Infused Reranking for Passages Retrieval is a framework for training rerankers based on a hybrid of BM25 and neural retrieval models.
We present evaluations on a supervised passage retrieval task using MS MARCO and zero-shot retrieval tasks using BEIR.
arXiv Detail & Related papers (2022-12-20T18:44:21Z) - Parameter-Efficient Sparsity for Large Language Models Fine-Tuning [63.321205487234074]
We propose a.
sparse-efficient Sparse Training (PST) method to reduce the number of trainable parameters during sparse-aware training.
Experiments with diverse networks (i.e., BERT, RoBERTa and GPT-2) demonstrate PST performs on par or better than previous sparsity methods.
arXiv Detail & Related papers (2022-05-23T02:43:45Z) - LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text
Retrieval [55.097573036580066]
Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models.
Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.
arXiv Detail & Related papers (2022-03-11T18:53:12Z) - SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval [11.38022203865326]
SPLADE model provides highly sparse representations and competitive results with respect to state-of-the-art dense and sparse approaches.
We modify the pooling mechanism, benchmark a model solely based on document expansion, and introduce models trained with distillation.
Overall, SPLADE is considerably improved with more than $9$% gains on NDCG@10 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.
arXiv Detail & Related papers (2021-09-21T10:43:42Z) - Jigsaw Clustering for Unsupervised Visual Representation Learning [68.09280490213399]
We propose a new jigsaw clustering pretext task in this paper.
Our method makes use of information from both intra- and inter-images.
It is even comparable to the contrastive learning methods when only half of training batches are used.
arXiv Detail & Related papers (2021-04-01T08:09:26Z) - An Efficient Method of Training Small Models for Regression Problems
with Knowledge Distillation [1.433758865948252]
We propose a new formalism of knowledge distillation for regression problems.
First, we propose a new loss function, teacher outlier loss rejection, which rejects outliers in training samples using teacher model predictions.
By considering the multi-task network, training of the feature extraction of student models becomes more effective.
arXiv Detail & Related papers (2020-02-28T08:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.