Remining Hard Negatives for Generative Pseudo Labeled Domain Adaptation
- URL: http://arxiv.org/abs/2501.14434v1
- Date: Fri, 24 Jan 2025 12:02:37 GMT
- Title: Remining Hard Negatives for Generative Pseudo Labeled Domain Adaptation
- Authors: Goksenin Yuksel, David Rau, Jaap Kamps,
- Abstract summary: A state-of-the-art domain adaptation technique is Generative Pseudo Labeling (GPL)
We analyze the documents retrieved by the domain-adapted model and discover that these are more relevant to the target queries than those of the non-adapted model.
Our remining R-GPL approach boosts ranking performance in 13/14 BEIR datasets and 9/12 LoTTe datasets.
- Score: 0.649970685896541
- License:
- Abstract: Dense retrievers have demonstrated significant potential for neural information retrieval; however, they exhibit a lack of robustness to domain shifts, thereby limiting their efficacy in zero-shot settings across diverse domains. A state-of-the-art domain adaptation technique is Generative Pseudo Labeling (GPL). GPL uses synthetic query generation and initially mined hard negatives to distill knowledge from cross-encoder to dense retrievers in the target domain. In this paper, we analyze the documents retrieved by the domain-adapted model and discover that these are more relevant to the target queries than those of the non-domain-adapted model. We then propose refreshing the hard-negative index during the knowledge distillation phase to mine better hard negatives. Our remining R-GPL approach boosts ranking performance in 13/14 BEIR datasets and 9/12 LoTTe datasets. Our contributions are (i) analyzing hard negatives returned by domain-adapted and non-domain-adapted models and (ii) applying the GPL training with and without hard-negative re-mining in LoTTE and BEIR datasets.
Related papers
- Cyclically Disentangled Feature Translation for Face Anti-spoofing [61.70377630461084]
We propose a novel domain adaptation method called cyclically disentangled feature translation network (CDFTN)
CDFTN generates pseudo-labeled samples that possess: 1) source domain-invariant liveness features and 2) target domain-specific content features, which are disentangled through domain adversarial training.
A robust classifier is trained based on the synthetic pseudo-labeled images under the supervision of source domain labels.
arXiv Detail & Related papers (2022-12-07T14:12:34Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - Low-confidence Samples Matter for Domain Adaptation [47.552605279925736]
Domain adaptation (DA) aims to transfer knowledge from a label-rich source domain to a related but label-scarce target domain.
We propose a novel contrastive learning method by processing low-confidence samples.
We evaluate the proposed method in both unsupervised and semi-supervised DA settings.
arXiv Detail & Related papers (2022-02-06T15:45:45Z) - GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of
Dense Retrieval [43.43401655948693]
We propose a novel unsupervised domain adaptation method Generative Pseudo Labeling (GPL)
On six representative domain-specialized datasets, we find the proposed approach can outperform an out-of-the-box state-of-the-art dense retrieval approach by up to 8.9 points nDCG@10.
arXiv Detail & Related papers (2021-12-14T17:34:43Z) - Zero-Shot Dense Retrieval with Momentum Adversarial Domain Invariant
Representations [24.703464680511154]
We propose Momentum adversarial Domain Invariant Representation learning (MoDIR)
MoDIR trains a domain classifier distinguishing source versus target, and then adversarially updates the DR encoder to learn domain invariant representations.
Our experiments show that MoDIR robustly outperforms its baselines on 10+ ranking datasets from the BEIR benchmark in the zero-shot setup.
arXiv Detail & Related papers (2021-10-14T17:45:06Z) - Stagewise Unsupervised Domain Adaptation with Adversarial Self-Training
for Road Segmentation of Remote Sensing Images [93.50240389540252]
Road segmentation from remote sensing images is a challenging task with wide ranges of application potentials.
We propose a novel stagewise domain adaptation model called RoadDA to address the domain shift (DS) issue in this field.
Experiment results on two benchmarks demonstrate that RoadDA can efficiently reduce the domain gap and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-08-28T09:29:14Z) - Adaptive Pseudo-Label Refinement by Negative Ensemble Learning for
Source-Free Unsupervised Domain Adaptation [35.728603077621564]
Existing Unsupervised Domain Adaptation (UDA) methods presumes source and target domain data to be simultaneously available during training.
A pre-trained source model is always considered to be available, even though performing poorly on target due to the well-known domain shift problem.
We propose a unified method to tackle adaptive noise filtering and pseudo-label refinement.
arXiv Detail & Related papers (2021-03-29T22:18:34Z) - A Free Lunch for Unsupervised Domain Adaptive Object Detection without
Source Data [69.091485888121]
Unsupervised domain adaptation assumes that source and target domain data are freely available and usually trained together to reduce the domain gap.
We propose a source data-free domain adaptive object detection (SFOD) framework via modeling it into a problem of learning with noisy labels.
arXiv Detail & Related papers (2020-12-10T01:42:35Z) - Self-training Avoids Using Spurious Features Under Domain Shift [54.794607791641745]
In unsupervised domain adaptation, conditional entropy minimization and pseudo-labeling work even when the domain shifts are much larger than those analyzed by existing theory.
We identify and analyze one particular setting where the domain shift can be large, but certain spurious features correlate with label in the source domain but are independent label in the target.
arXiv Detail & Related papers (2020-06-17T17:51:42Z) - Supervised Domain Adaptation using Graph Embedding [86.3361797111839]
Domain adaptation methods assume that distributions between the two domains are shifted and attempt to realign them.
We propose a generic framework based on graph embedding.
We show that the proposed approach leads to a powerful Domain Adaptation framework.
arXiv Detail & Related papers (2020-03-09T12:25:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.