GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of
Dense Retrieval
- URL: http://arxiv.org/abs/2112.07577v1
- Date: Tue, 14 Dec 2021 17:34:43 GMT
- Title: GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of
Dense Retrieval
- Authors: Kexin Wang, Nandan Thakur, Nils Reimers, Iryna Gurevych
- Abstract summary: We propose a novel unsupervised domain adaptation method Generative Pseudo Labeling (GPL)
On six representative domain-specialized datasets, we find the proposed approach can outperform an out-of-the-box state-of-the-art dense retrieval approach by up to 8.9 points nDCG@10.
- Score: 43.43401655948693
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Dense retrieval approaches can overcome the lexical gap and lead to
significantly improved search results. However, they require large amounts of
training data which is not available for most domains. As shown in previous
work (Thakur et al., 2021b), the performance of dense retrievers severely
degrades under a domain shift. This limits the usage of dense retrieval
approaches to only a few domains with large training datasets.
In this paper, we propose the novel unsupervised domain adaptation method
Generative Pseudo Labeling (GPL), which combines a query generator with pseudo
labeling from a cross-encoder. On six representative domain-specialized
datasets, we find the proposed GPL can outperform an out-of-the-box
state-of-the-art dense retrieval approach by up to 8.9 points nDCG@10. GPL
requires less (unlabeled) data from the target domain and is more robust in its
training than previous methods.
We further investigate the role of six recent pre-training methods in the
scenario of domain adaptation for retrieval tasks, where only three could yield
improved results. The best approach, TSDAE (Wang et al., 2021) can be combined
with GPL, yielding another average improvement of 1.0 points nDCG@10 across the
six tasks.
Related papers
- Remining Hard Negatives for Generative Pseudo Labeled Domain Adaptation [0.649970685896541]
A state-of-the-art domain adaptation technique is Generative Pseudo Labeling (GPL)
We analyze the documents retrieved by the domain-adapted model and discover that these are more relevant to the target queries than those of the non-adapted model.
Our remining R-GPL approach boosts ranking performance in 13/14 BEIR datasets and 9/12 LoTTe datasets.
arXiv Detail & Related papers (2025-01-24T12:02:37Z) - Improving Domain Adaptation Through Class Aware Frequency Transformation [15.70058524548143]
Most of the Unsupervised Domain Adaptation (UDA) algorithms focus on reducing the global domain shift between labelled source and unlabelled target domains.
We propose a novel approach based on traditional image processing technique Class Aware Frequency Transformation (CAFT)
CAFT utilizes pseudo label based class consistent low-frequency swapping for improving the overall performance of the existing UDA algorithms.
arXiv Detail & Related papers (2024-07-28T18:16:41Z) - Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap [11.96884248631201]
We tackle the multimodal version of the unsupervised domain generalization problem.
Our framework relies on the premise that the source dataset can be accurately and efficiently searched in a joint vision-language space.
We show theoretically that cross-modal approximate nearest neighbor search suffers from low recall due to the large distance between text queries and the image centroids used for coarse quantization.
arXiv Detail & Related papers (2024-02-06T21:29:37Z) - AdaTriplet-RA: Domain Matching via Adaptive Triplet and Reinforced
Attention for Unsupervised Domain Adaptation [15.905869933337101]
Unsupervised domain adaption (UDA) is a transfer learning task where the data and annotations of the source domain are available but only have access to the unlabeled target data during training.
We propose to improve the unsupervised domain adaptation task with an inter-domain sample matching scheme.
We apply the widely-used and robust Triplet loss to match the inter-domain samples.
To reduce the catastrophic effect of the inaccurate pseudo-labels generated during training, we propose a novel uncertainty measurement method to select reliable pseudo-labels automatically and progressively refine them.
arXiv Detail & Related papers (2022-11-16T13:04:24Z) - Source-Free Domain Adaptation via Distribution Estimation [106.48277721860036]
Domain Adaptation aims to transfer the knowledge learned from a labeled source domain to an unlabeled target domain whose data distributions are different.
Recently, Source-Free Domain Adaptation (SFDA) has drawn much attention, which tries to tackle domain adaptation problem without using source data.
In this work, we propose a novel framework called SFDA-DE to address SFDA task via source Distribution Estimation.
arXiv Detail & Related papers (2022-04-24T12:22:19Z) - Instance Level Affinity-Based Transfer for Unsupervised Domain
Adaptation [74.71931918541748]
We propose an instance affinity based criterion for source to target transfer during adaptation, called ILA-DA.
We first propose a reliable and efficient method to extract similar and dissimilar samples across source and target, and utilize a multi-sample contrastive loss to drive the domain alignment process.
We verify the effectiveness of ILA-DA by observing consistent improvements in accuracy over popular domain adaptation approaches on a variety of benchmark datasets.
arXiv Detail & Related papers (2021-04-03T01:33:14Z) - Regressive Domain Adaptation for Unsupervised Keypoint Detection [67.2950306888855]
Domain adaptation (DA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain.
We present a method of regressive domain adaptation (RegDA) for unsupervised keypoint detection.
Our method brings large improvement by 8% to 11% in terms of PCK on different datasets.
arXiv Detail & Related papers (2021-03-10T16:45:22Z) - Inductive Unsupervised Domain Adaptation for Few-Shot Classification via
Clustering [16.39667909141402]
Few-shot classification tends to struggle when it needs to adapt to diverse domains.
We introduce a framework, DaFeC, to improve Domain adaptation performance for Few-shot classification via Clustering.
Our approach outperforms previous work with absolute gains (in classification accuracy) of 4.95%, 9.55%, 3.99% and 11.62%, respectively.
arXiv Detail & Related papers (2020-06-23T08:17:48Z) - Sparsely-Labeled Source Assisted Domain Adaptation [64.75698236688729]
This paper proposes a novel Sparsely-Labeled Source Assisted Domain Adaptation (SLSA-DA) algorithm.
Due to the label scarcity problem, the projected clustering is conducted on both the source and target domains.
arXiv Detail & Related papers (2020-05-08T15:37:35Z) - Supervised Domain Adaptation using Graph Embedding [86.3361797111839]
Domain adaptation methods assume that distributions between the two domains are shifted and attempt to realign them.
We propose a generic framework based on graph embedding.
We show that the proposed approach leads to a powerful Domain Adaptation framework.
arXiv Detail & Related papers (2020-03-09T12:25:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.