How to Train Your DRAGON: Diverse Augmentation Towards Generalizable
Dense Retrieval
- URL: http://arxiv.org/abs/2302.07452v1
- Date: Wed, 15 Feb 2023 03:53:26 GMT
- Title: How to Train Your DRAGON: Diverse Augmentation Towards Generalizable
Dense Retrieval
- Authors: Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin,
Yashar Mehdad, Wen-tau Yih, Xilun Chen
- Abstract summary: We show that a generalizable dense retriever can be trained to achieve high accuracy in both supervised and zero-shot retrieval.
DRAGON, our dense retriever trained with diverse augmentation, is the first BERT-base-sized DR to achieve state-of-the-art effectiveness in both supervised and zero-shot evaluations.
- Score: 80.54532535622988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Various techniques have been developed in recent years to improve dense
retrieval (DR), such as unsupervised contrastive learning and pseudo-query
generation. Existing DRs, however, often suffer from effectiveness tradeoffs
between supervised and zero-shot retrieval, which some argue was due to the
limited model capacity. We contradict this hypothesis and show that a
generalizable DR can be trained to achieve high accuracy in both supervised and
zero-shot retrieval without increasing model size. In particular, we
systematically examine the contrastive learning of DRs, under the framework of
Data Augmentation (DA). Our study shows that common DA practices such as query
augmentation with generative models and pseudo-relevance label creation using a
cross-encoder, are often inefficient and sub-optimal. We hence propose a new DA
approach with diverse queries and sources of supervision to progressively train
a generalizable DR. As a result, DRAGON, our dense retriever trained with
diverse augmentation, is the first BERT-base-sized DR to achieve
state-of-the-art effectiveness in both supervised and zero-shot evaluations and
even competes with models using more complex late interaction (ColBERTv2 and
SPLADE++).
Related papers
- Divergent Domains, Convergent Grading: Enhancing Generalization in Diabetic Retinopathy Grading [8.59772105902647]
Diabetic Retinopathy (DR) constitutes 5% of global blindness cases.
We introduce a novel deep learning method for achieving domain generalization (DG) in DR grading.
Our method demonstrates significant improvements over the strong Empirical Risk Minimization baseline.
arXiv Detail & Related papers (2024-11-04T21:09:24Z) - Generalizing to Unseen Domains in Diabetic Retinopathy Classification [8.59772105902647]
We study the problem of generalizing a model to unseen distributions or domains in diabetic retinopathy classification.
We propose a simple and effective domain generalization (DG) approach that achieves self-distillation in vision transformers.
We report the performance of several state-of-the-art DG methods on open-source DR classification datasets.
arXiv Detail & Related papers (2023-10-26T09:11:55Z) - Black-box Adversarial Attacks against Dense Retrieval Models: A
Multi-view Contrastive Learning Method [115.29382166356478]
We introduce the adversarial retrieval attack (AREA) task.
It is meant to trick DR models into retrieving a target document that is outside the initial set of candidate documents retrieved by the DR model.
We find that the promising results that have previously been reported on attacking NRMs, do not generalize to DR models.
We propose to formalize attacks on DR models as a contrastive learning problem in a multi-view representation space.
arXiv Detail & Related papers (2023-08-19T00:24:59Z) - Measuring the Robustness of NLP Models to Domain Shifts [50.89876374569385]
Existing research on Domain Robustness (DR) suffers from disparate setups, limited task variety, and scarce research on recent capabilities such as in-context learning.
Current research focuses on challenge sets and relies solely on the Source Drop (SD): Using the source in-domain performance as a reference point for degradation.
We argue that the Target Drop (TD), which measures degradation from the target in-domain performance, should be used as a complementary point of view.
arXiv Detail & Related papers (2023-05-31T20:25:08Z) - Learning Better with Less: Effective Augmentation for Sample-Efficient
Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms.
It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL.
This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z) - A Generalized Doubly Robust Learning Framework for Debiasing Post-Click
Conversion Rate Prediction [23.340584290411208]
Post-click conversion rate (CVR) prediction is an essential task for discovering user interests and increasing platform revenues.
Currently, doubly robust (DR) learning approaches achieve the state-of-the-art performance for debiasing CVR prediction.
We propose two new DR methods, namely DR-BIAS and DR-MSE, which control the bias of DR loss and balance the bias and variance flexibly.
arXiv Detail & Related papers (2022-11-12T15:09:23Z) - Disentangled Modeling of Domain and Relevance for Adaptable Dense
Retrieval [54.349418995689284]
We propose a novel Dense Retrieval (DR) framework named Disentangled Dense Retrieval ( DDR) to support effective domain adaptation for DR models.
By making the REM and DAMs disentangled, DDR enables a flexible training paradigm in which REM is trained with supervision once and DAMs are trained with unsupervised data.
DDR significantly improves ranking performance compared to strong DR baselines and substantially outperforms traditional retrieval methods in most scenarios.
arXiv Detail & Related papers (2022-08-11T11:18:50Z) - Augmentation-induced Consistency Regularization for Classification [25.388324221293203]
We propose a consistency regularization framework based on data augmentation, called CR-Aug.
CR-Aug forces the output distributions of different sub models generated by data augmentation to be consistent with each other.
We implement CR-Aug to image and audio classification tasks and conduct extensive experiments to verify its effectiveness.
arXiv Detail & Related papers (2022-05-25T03:15:36Z) - Doubly Robust Collaborative Targeted Learning for Recommendation on Data
Missing Not at Random [6.563595953273317]
In recommender systems, the feedback data received is always missing not at random (MNAR)
We propose bf DR-TMLE that effectively captures the merits of both error imputation-based (EIB) and doubly robust (DR) methods.
We also propose a novel RCT-free collaborative targeted learning algorithm for DR-TMLE, called bf DR-TMLE-TL
arXiv Detail & Related papers (2022-03-19T06:48:50Z) - Analyzing Dynamic Adversarial Training Data in the Limit [50.00850852546616]
Dynamic adversarial data collection (DADC) holds promise as an approach for generating such diverse training sets.
We present the first study of longer-term DADC, where we collect 20 rounds of NLI examples for a small set of premise paragraphs.
Models trained on DADC examples make 26% fewer errors on our expert-curated test set compared to models trained on non-adversarial data.
arXiv Detail & Related papers (2021-10-16T08:48:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.