Related papers: Training Data is More Valuable than You Think: A Simple and Effective Method by Retrieving from Training Data

Training Data is More Valuable than You Think: A Simple and Effective Method by Retrieving from Training Data

URL: http://arxiv.org/abs/2203.08773v1
Date: Wed, 16 Mar 2022 17:37:27 GMT
Title: Training Data is More Valuable than You Think: A Simple and Effective Method by Retrieving from Training Data
Authors: Shuohang Wang, Yichong Xu, Yuwei Fang, Yang Liu, Siqi Sun, Ruochen Xu, Chenguang Zhu, Michael Zeng
Abstract summary: Retrieval-based methods have been shown to be effective in NLP tasks via introducing external knowledge. Surprisingly, we found that REtrieving from the traINing datA (REINA) only can lead to significant gains on multiple NLG and NLU tasks. Experimental results show that this simple method can achieve significantly better performance on a variety of NLU and NLG tasks.
Score: 82.92758444543689
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrieval-based methods have been shown to be effective in NLP tasks via introducing external knowledge. However, the indexing and retrieving of large-scale corpora bring considerable computational cost. Surprisingly, we found that REtrieving from the traINing datA (REINA) only can lead to significant gains on multiple NLG and NLU tasks. We retrieve the labeled training instances most similar to the input text and then concatenate them with the input to feed into the model to generate the output. Experimental results show that this simple method can achieve significantly better performance on a variety of NLU and NLG tasks, including summarization, machine translation, language modeling, and question answering tasks. For instance, our proposed method achieved state-of-the-art results on XSum, BigPatent, and CommonsenseQA. Our code is released, https://github.com/microsoft/REINA .

Related papers

RUIE: Retrieval-based Unified Information Extraction using Large Language Model [6.788855739199981]
Unified information extraction aims to complete all information extraction tasks using a single model or framework. We propose RUIE (Retrieval-based Unified Information Extraction), a framework that leverages in-context learning to enable rapid generalization. Experimental results on 8 held-out datasets demonstrate RUIE's effectiveness in generalizing to unseen tasks.
arXiv Detail & Related papers (2024-09-18T03:20:04Z)
Great Memory, Shallow Reasoning: Limits of $k$NN-LMs [71.73611113995143]
$k$NN-LMs, which integrate retrieval with next-word prediction, have demonstrated strong performance in language modeling. We ask whether this improved ability to recall information really translates into downstream abilities.
arXiv Detail & Related papers (2024-08-21T17:59:05Z)
Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers [56.12593882838412]
We introduce a novel instruction distillation method to rank documents. We first rank documents using the effective pairwise approach with complex instructions, and then distill the teacher predictions to the pointwise approach with simpler instructions. Our approach surpasses the performance of existing supervised methods like monoT5 and is on par with the state-of-the-art zero-shot methods.
arXiv Detail & Related papers (2023-11-02T19:16:21Z)
LLMaAA: Making Large Language Models as Active Annotators [32.57011151031332]
We propose LLMaAA, which takes large language models as annotators and puts them into an active learning loop to determine what to annotate efficiently. We conduct experiments and analysis on two classic NLP tasks, named entity recognition and relation extraction. With LLMaAA, task-specific models trained from LLM-generated labels can outperform the teacher within only hundreds of annotated examples.
arXiv Detail & Related papers (2023-10-30T14:54:15Z)
Bag of Tricks for Training Data Extraction from Language Models [98.40637430115204]
We investigate and benchmark tricks for improving training data extraction using a publicly available dataset. The experimental results show that several previously overlooked tricks can be crucial to the success of training data extraction.
arXiv Detail & Related papers (2023-02-09T06:46:42Z)
All Birds with One Stone: Multi-task Text Classification for Efficient Inference with One Forward Pass [34.85886030306857]
In web content classification, multiple classification tasks are predicted from same input text such as a web article. Existing multitask transformer models need to conduct N forward passes for N tasks with O(N) cost. We propose a scalable method that can achieve stronger performance with close to O(1) computation cost via only one forward pass.
arXiv Detail & Related papers (2022-05-22T05:16:03Z)
Learning To Retrieve Prompts for In-Context Learning [33.176481861880724]
We propose an efficient method for retrieving prompts for in-context learning using annotated data and a LM. We evaluate our approach on three sequence-to-sequence tasks where language utterances are mapped to meaning representations.
arXiv Detail & Related papers (2021-12-16T05:17:56Z)
Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts. We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
Bootstrapping Relation Extractors using Syntactic Search by Examples [47.11932446745022]
We propose a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts. We take advantage of search engines over syntactic-graphs which expose a friendly by-example syntax. We show that the resulting models are competitive with models trained on manually annotated data and on data obtained from distant supervision.
arXiv Detail & Related papers (2021-02-09T18:17:59Z)
MC-BERT: Efficient Language Pre-Training via a Meta Controller [96.68140474547602]
Large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator. We propose a novel meta-learning framework, MC-BERT, to achieve better efficiency and effectiveness.
arXiv Detail & Related papers (2020-06-10T09:22:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.