DPTDR: Deep Prompt Tuning for Dense Passage Retrieval
- URL: http://arxiv.org/abs/2208.11503v1
- Date: Wed, 24 Aug 2022 12:55:00 GMT
- Title: DPTDR: Deep Prompt Tuning for Dense Passage Retrieval
- Authors: Zhengyang Tang, Benyou Wang, Ting Yao
- Abstract summary: Deep prompt tuning (DPT) has gained great success in most natural language processing(NLP) tasks.
However, it is not well-investigated in dense retrieval where fine-tuning(FT) still dominates.
We propose two model-agnostic and task-agnostic strategies for DPT-based retrievers, namely retrieval-oriented intermediate pretraining and unified negative mining.
- Score: 53.217524851268216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep prompt tuning (DPT) has gained great success in most natural language
processing~(NLP) tasks. However, it is not well-investigated in dense retrieval
where fine-tuning~(FT) still dominates. When deploying multiple retrieval tasks
using the same backbone model~(e.g., RoBERTa), FT-based methods are unfriendly
in terms of deployment cost: each new retrieval model needs to repeatedly
deploy the backbone model without reuse. To reduce the deployment cost in such
a scenario, this work investigates applying DPT in dense retrieval. The
challenge is that directly applying DPT in dense retrieval largely
underperforms FT methods. To compensate for the performance drop, we propose
two model-agnostic and task-agnostic strategies for DPT-based retrievers,
namely retrieval-oriented intermediate pretraining and unified negative mining,
as a general approach that could be compatible with any pre-trained language
model and retrieval task. The experimental results show that the proposed
method (called DPTDR) outperforms previous state-of-the-art models on both
MS-MARCO and Natural Questions. We also conduct ablation studies to examine the
effectiveness of each strategy in DPTDR. We believe this work facilitates the
industry, as it saves enormous efforts and costs of deployment and increases
the utility of computing resources. Our code is available at
https://github.com/tangzhy/DPTDR.
Related papers
- A Bayesian Approach to Data Point Selection [24.98069363998565]
Data point selection (DPS) is becoming a critical topic in deep learning.
Existing approaches to DPS are predominantly based on a bi-level optimisation (BLO) formulation.
We propose a novel Bayesian approach to DPS.
arXiv Detail & Related papers (2024-11-06T09:04:13Z) - BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping [64.8477128397529]
We propose a training-required and training-free test-time adaptation framework.
We maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples.
We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets.
arXiv Detail & Related papers (2024-10-20T15:58:43Z) - Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference [4.913049603343811]
Existing r measurement methods often overlook the incremental learning scenario.
Most existing class incremental learning approaches are unsuitable for r measurement.
We present a novel method named ADDP to tackle continual learning for r measurement.
arXiv Detail & Related papers (2024-07-19T01:49:09Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - DiffNAS: Bootstrapping Diffusion Models by Prompting for Better
Architectures [63.12993314908957]
We propose a base model search approach, denoted "DiffNAS"
We leverage GPT-4 as a supernet to expedite the search, supplemented with a search memory to enhance the results.
Rigorous experimentation corroborates that our algorithm can augment the search efficiency by 2 times under GPT-based scenarios.
arXiv Detail & Related papers (2023-10-07T09:10:28Z) - RDumb: A simple approach that questions our progress in continual test-time adaptation [12.374649969346441]
Test-Time Adaptation (TTA) allows to update pre-trained models to changing data distributions at deployment time.
Recent work proposed and applied methods for continual adaptation over long timescales.
We find that eventually all but one state-of-the-art methods collapse and perform worse than a non-adapting model.
arXiv Detail & Related papers (2023-06-08T17:52:34Z) - Task-guided Disentangled Tuning for Pretrained Language Models [16.429787408467703]
We propose Task-guided Disentangled Tuning (TDT) for pretrained language models (PLMs)
TDT enhances the generalization of representations by disentangling task-relevant signals from entangled representations.
Experimental results on GLUE and CLUE benchmarks show that TDT gives consistently better results than fine-tuning with different PLMs.
arXiv Detail & Related papers (2022-03-22T03:11:39Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z) - Generalized ODIN: Detecting Out-of-distribution Image without Learning
from Out-of-distribution Data [87.61504710345528]
We propose two strategies for freeing a neural network from tuning with OoD data, while improving its OoD detection performance.
We specifically propose to decompose confidence scoring as well as a modified input pre-processing method.
Our further analysis on a larger scale image dataset shows that the two types of distribution shifts, specifically semantic shift and non-semantic shift, present a significant difference.
arXiv Detail & Related papers (2020-02-26T04:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.