RARR: Researching and Revising What Language Models Say, Using Language
Models
- URL: http://arxiv.org/abs/2210.08726v3
- Date: Wed, 31 May 2023 17:55:02 GMT
- Title: RARR: Researching and Revising What Language Models Say, Using Language
Models
- Authors: Luyu Gao, Zhuyun Dai, Panupong Pasupat, Anthony Chen, Arun Tejasvi
Chaganty, Yicheng Fan, Vincent Y. Zhao, Ni Lao, Hongrae Lee, Da-Cheng Juan,
Kelvin Guu
- Abstract summary: We propose RARR (Retrofit Attribution using Research and Revision), a system that automatically finds attribution for the output of any text generation model.
We find that RARR significantly improves attribution while otherwise preserving the original input to a much greater degree than previously explored edit models.
- Score: 31.057495176599502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models (LMs) now excel at many tasks such as few-shot learning,
question answering, reasoning, and dialog. However, they sometimes generate
unsupported or misleading content. A user cannot easily determine whether their
outputs are trustworthy or not, because most LMs do not have any built-in
mechanism for attribution to external evidence. To enable attribution while
still preserving all the powerful advantages of recent generation models, we
propose RARR (Retrofit Attribution using Research and Revision), a system that
1) automatically finds attribution for the output of any text generation model
and 2) post-edits the output to fix unsupported content while preserving the
original output as much as possible. When applied to the output of several
state-of-the-art LMs on a diverse set of generation tasks, we find that RARR
significantly improves attribution while otherwise preserving the original
input to a much greater degree than previously explored edit models.
Furthermore, the implementation of RARR requires only a handful of training
examples, a large language model, and standard web search.
Related papers
- LAQuer: Localized Attribution Queries in Content-grounded Generation [69.60308443863606]
Grounded text generation models often produce content that deviates from their source material, requiring user verification to ensure accuracy.<n>Existing attribution methods associate entire sentences with source documents, which can be overwhelming for users seeking to fact-check specific claims.<n>We introduce Localized Attribution Queries (LAQuer), a new task that localizes selected spans of generated output to their corresponding source spans, allowing fine-grained and user-directed attribution.
arXiv Detail & Related papers (2025-06-01T21:46:23Z) - Think Before You Attribute: Improving the Performance of LLMs Attribution Systems [2.527698260421756]
We propose a sentence-level pre-attribution step for Retrieve-Augmented Generation (RAG) systems.<n>By separating sentences before attribution, a proper attribution method can be selected for the type of sentence, or the attribution can be skipped altogether.
arXiv Detail & Related papers (2025-05-19T02:08:20Z) - DBR: Divergence-Based Regularization for Debiasing Natural Language Understanding Models [50.54264918467997]
Pre-trained language models (PLMs) have achieved impressive results on various natural language processing tasks.
Recent research has revealed that these models often rely on superficial features and shortcuts instead of developing a genuine understanding of language.
We propose Divergence Based Regularization (DBR) to mitigate this shortcut learning behavior.
arXiv Detail & Related papers (2025-02-25T16:44:10Z) - Language Models can Self-Lengthen to Generate Long Texts [74.96074422345806]
This paper introduces an innovative iterative training framework called Self-Lengthen.
It leverages only the intrinsic knowledge and skills of Large Language Models without the need for auxiliary data or proprietary models.
Experiments on benchmarks and human evaluations show that Self-Lengthen outperforms existing methods in long-text generation.
arXiv Detail & Related papers (2024-10-31T13:47:10Z) - Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - DALD: Improving Logits-based Detector without Logits from Black-box LLMs [56.234109491884126]
Large Language Models (LLMs) have revolutionized text generation, producing outputs that closely mimic human writing.
We present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection.
DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations.
arXiv Detail & Related papers (2024-06-07T19:38:05Z) - UniRAG: Universal Retrieval Augmentation for Large Vision Language Models [76.30799731147589]
We introduce UniRAG, a plug-and-play technique that adds relevant retrieved information to prompts as few-shot examples during inference.
Unlike the common belief that Retrieval Augmentation (RA) mainly improves generation or understanding of uncommon entities, our evaluation results on the MSCOCO dataset with common entities show that both proprietary models and smaller open-source models significantly enhance their generation quality.
arXiv Detail & Related papers (2024-05-16T17:58:45Z) - RAR-b: Reasoning as Retrieval Benchmark [7.275757292756447]
We transform reasoning tasks into retrieval tasks to evaluate reasoning abilities stored in retriever models.
Recent decoder-based embedding models show great promise in narrowing the gap.
We release Reasoning as Retrieval Benchmark (RAR-b), a holistic suite of tasks and settings to evaluate the reasoning abilities stored in retriever models.
arXiv Detail & Related papers (2024-04-09T14:34:48Z) - Generative Representational Instruction Tuning [89.76840377003178]
GritLM 7B sets a new state of the art on the Massive Text Embedding Benchmark (MTEB)
GritLM 8x7B outperforms all open generative language models that we tried while still being among the best embedding models.
arXiv Detail & Related papers (2024-02-15T12:12:19Z) - RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder
for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE)
It encodes the text corpus into a latent space, capturing current and future information from both source and target text.
Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z) - Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented
Large Language Models [6.425088990363101]
We examine the relationship between fluency and attribution in Large Language Models prompted with retrieved evidence.
We show that larger models tend to do much better in both fluency and attribution.
We propose a recipe that could allow smaller models to both close the gap with larger models and preserve the benefits of top-k retrieval.
arXiv Detail & Related papers (2023-02-11T02:43:34Z) - In-Context Retrieval-Augmented Language Models [28.23702459322163]
We show that In-Context RALM builds on off-the-shelf general purpose retrievers to provide surprisingly large LM gains across model sizes and diverse corpora.
We conclude that In-Context RALM has considerable potential to increase the prevalence of LM grounding.
arXiv Detail & Related papers (2023-01-31T20:26:16Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.