Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented
Large Language Models
- URL: http://arxiv.org/abs/2302.05578v2
- Date: Tue, 14 Feb 2023 23:53:56 GMT
- Title: Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented
Large Language Models
- Authors: Renat Aksitov, Chung-Ching Chang, David Reitter, Siamak Shakeri,
Yunhsuan Sung
- Abstract summary: We examine the relationship between fluency and attribution in Large Language Models prompted with retrieved evidence.
We show that larger models tend to do much better in both fluency and attribution.
We propose a recipe that could allow smaller models to both close the gap with larger models and preserve the benefits of top-k retrieval.
- Score: 6.425088990363101
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite recent progress, it has been difficult to prevent semantic
hallucinations in generative Large Language Models. One common solution to this
is augmenting LLMs with a retrieval system and making sure that the generated
output is attributable to the retrieved information. Given this new added
constraint, it is plausible to expect that the overall quality of the output
will be affected, for example, in terms of fluency. Can scaling language models
help?
Here we examine the relationship between fluency and attribution in LLMs
prompted with retrieved evidence in knowledge-heavy dialog settings. Our
experiments were implemented with a set of auto-metrics that are aligned with
human preferences. They were used to evaluate a large set of generations,
produced under varying parameters of LLMs and supplied context.
We show that larger models tend to do much better in both fluency and
attribution, and that (naively) using top-k retrieval versus top-1 retrieval
improves attribution but hurts fluency. We next propose a recipe that could
allow smaller models to both close the gap with larger models and preserve the
benefits of top-k retrieval while avoiding its drawbacks.
Related papers
- LargePiG: Your Large Language Model is Secretly a Pointer Generator [15.248956952849259]
We introduce relevance hallucination and factuality hallucination as a new typology for hallucination problems brought by query generation based on Large Language Models (LLMs)
We propose an effective way to separate content from form in LLM-generated queries, which preserves the factual knowledge extracted and integrated from the inputs and compiles the syntactic structure, including function words, using the powerful linguistic capabilities of the LLM.
arXiv Detail & Related papers (2024-10-15T07:41:40Z) - Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - Efficient Long-range Language Modeling with Self-supervised Causal Retrieval [39.24972628990943]
Grouped Cross-Attention is a novel module enabling joint pre-training of the retriever and causal LM.
By integrating top-$k$ retrieval, our model can be pre-trained efficiently from scratch with context lengths up to 64K tokens.
arXiv Detail & Related papers (2024-10-02T15:18:34Z) - Can We Use Large Language Models to Fill Relevance Judgment Holes? [9.208308067952155]
We take initial steps towards extending existing test collections by employing Large Language Models (LLM) to fill the holes.
We find substantially lower correlates when human plus automatic judgments are used.
arXiv Detail & Related papers (2024-05-09T07:39:19Z) - Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines.
Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations'
In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z) - RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder
for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE)
It encodes the text corpus into a latent space, capturing current and future information from both source and target text.
Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z) - Can Large Language Models Infer Causation from Correlation? [104.96351414570239]
We test the pure causal inference skills of large language models (LLMs)
We formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables.
We show that these models achieve almost close to random performance on the task.
arXiv Detail & Related papers (2023-06-09T12:09:15Z) - Compressing Sentence Representation with maximum Coding Rate Reduction [0.0]
In most natural language inference problems, sentence representation is needed for semantic retrieval tasks.
Due to space and time hardware limitations, there is a need to attain comparable results when using the smaller model.
We demonstrate that the new language model with reduced complexity and sentence embedding size can achieve comparable results on semantic retrieval benchmarks.
arXiv Detail & Related papers (2023-04-25T09:23:43Z) - On the Generalization Ability of Retrieval-Enhanced Transformers [1.0552465253379135]
Off-loading memory from trainable weights to a retrieval database can significantly improve language modeling.
It has been suggested that at least some of this performance gain is due to non-trivial generalization based on both model weights and retrieval.
We find that the performance gains from retrieval largely originate from overlapping tokens between the database and the test data.
arXiv Detail & Related papers (2023-02-23T16:11:04Z) - Augmenting Interpretable Models with LLMs during Training [73.40079895413861]
We propose Augmented Interpretable Models (Aug-imodels) to build efficient and interpretable models.
Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency.
We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions.
arXiv Detail & Related papers (2022-09-23T18:36:01Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.