Document Ranking with a Pretrained Sequence-to-Sequence Model
- URL: http://arxiv.org/abs/2003.06713v1
- Date: Sat, 14 Mar 2020 22:29:50 GMT
- Title: Document Ranking with a Pretrained Sequence-to-Sequence Model
- Authors: Rodrigo Nogueira, Zhiying Jiang, Jimmy Lin
- Abstract summary: We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
- Score: 56.44269917346376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work proposes a novel adaptation of a pretrained sequence-to-sequence
model to the task of document ranking. Our approach is fundamentally different
from a commonly-adopted classification-based formulation of ranking, based on
encoder-only pretrained transformer architectures such as BERT. We show how a
sequence-to-sequence model can be trained to generate relevance labels as
"target words", and how the underlying logits of these target words can be
interpreted as relevance probabilities for ranking. On the popular MS MARCO
passage ranking task, experimental results show that our approach is at least
on par with previous classification-based models and can surpass them with
larger, more-recent models. On the test collection from the TREC 2004 Robust
Track, we demonstrate a zero-shot transfer-based approach that outperforms
previous state-of-the-art models requiring in-dataset cross-validation.
Furthermore, we find that our approach significantly outperforms an
encoder-only model in a data-poor regime (i.e., with few training examples). We
investigate this observation further by varying target words to probe the
model's use of latent knowledge.
Related papers
- StochCA: A Novel Approach for Exploiting Pretrained Models with Cross-Attention [2.66269503676104]
We introduce a novel fine-tuning method, called cross-attention (StochCA), specific to Transformer architectures.
This method modifies the Transformer's self-attention mechanism to selectively utilize knowledge from pretrained models during fine-tuning.
Our experimental results show the superiority of StochCA over state-of-the-art approaches in both areas.
arXiv Detail & Related papers (2024-02-25T13:53:49Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Enhancing Few-shot NER with Prompt Ordering based Data Augmentation [59.69108119752584]
We propose a Prompt Ordering based Data Augmentation (PODA) method to improve the training of unified autoregressive generation frameworks.
Experimental results on three public NER datasets and further analyses demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-05-19T16:25:43Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Coarse-to-Fine Memory Matching for Joint Retrieval and Classification [0.7081604594416339]
We present a novel end-to-end language model for joint retrieval and classification.
We evaluate it on the standard blind test set of the FEVER fact verification dataset.
We extend exemplar auditing to this setting for analyzing and constraining the model.
arXiv Detail & Related papers (2020-11-29T05:06:03Z) - Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models.
We show that the nature of pre-training itself is a performant source of diversity.
We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.