Enabling Large Language Models to Generate Text with Citations
- URL: http://arxiv.org/abs/2305.14627v2
- Date: Tue, 31 Oct 2023 15:04:35 GMT
- Title: Enabling Large Language Models to Generate Text with Citations
- Authors: Tianyu Gao, Howard Yen, Jiatong Yu, Danqi Chen
- Abstract summary: Large language models (LLMs) have emerged as a widely-used tool for information seeking.
Our aim is to allow LLMs to generate text with citations, improving their factual correctness and verifiability.
We propose ALCE, the first benchmark for Automatic LLMs' Citation Evaluation.
- Score: 37.64884969997378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have emerged as a widely-used tool for
information seeking, but their generated outputs are prone to hallucination. In
this work, our aim is to allow LLMs to generate text with citations, improving
their factual correctness and verifiability. Existing work mainly relies on
commercial search engines and human evaluation, making it challenging to
reproduce and compare different modeling approaches. We propose ALCE, the first
benchmark for Automatic LLMs' Citation Evaluation. ALCE collects a diverse set
of questions and retrieval corpora and requires building end-to-end systems to
retrieve supporting evidence and generate answers with citations. We develop
automatic metrics along three dimensions -- fluency, correctness, and citation
quality -- and demonstrate their strong correlation with human judgements. Our
experiments with state-of-the-art LLMs and novel prompting strategies show that
current systems have considerable room for improvement -- For example, on the
ELI5 dataset, even the best models lack complete citation support 50% of the
time. Our analyses further highlight promising future directions, including
developing better retrievers, advancing long-context LLMs, and improving the
ability to synthesize information from multiple sources.
Related papers
- Advancing Large Language Model Attribution through Self-Improving [32.77250400438304]
We present START, a framework for improving the attribution capability of large language models (LLMs)
START iteratively utilizes fine-grained preference supervision signals constructed from its sampled responses to encourage robust, comprehensive, and attributable generation.
Experiments on three open-domain question-answering datasets, covering long-form QA and multi-step reasoning, demonstrate significant performance gains of 25.13% on average.
arXiv Detail & Related papers (2024-10-17T07:55:33Z) - On the Capacity of Citation Generation by Large Language Models [38.47160164251295]
Retrieval-augmented generation (RAG) appears as a promising method to alleviate the "hallucination" problem in large language models (LLMs)
arXiv Detail & Related papers (2024-10-15T03:04:26Z) - Citekit: A Modular Toolkit for Large Language Model Citation Generation [20.509394248001723]
Large Language Models (LLMs) generate citations in Question-Answering (QA) tasks.
There is currently no unified framework to standardize and fairly compare different citation generation methods.
We introduce name, an open-source and modular toolkit designed to facilitate the implementation and evaluation of existing citation generation methods.
arXiv Detail & Related papers (2024-08-06T02:13:15Z) - Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation [51.8188846284153]
RAG has been widely adopted to enhance Large Language Models (LLMs)
Attributed Text Generation (ATG) has attracted growing attention, which provides citations to support the model's responses in RAG.
This paper proposes a fine-grained ATG method called ReClaim(Refer & Claim), which alternates the generation of references and answers step by step.
arXiv Detail & Related papers (2024-07-01T20:47:47Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Peering into the Mind of Language Models: An Approach for Attribution in Contextual Question Answering [9.86691461253151]
We introduce a novel method for attribution in contextual question answering, leveraging the hidden state representations of large language models (LLMs)
Our approach bypasses the need for extensive model retraining and retrieval model overhead, offering granular attributions and preserving the quality of generated answers.
We present Verifiability-granular, an attribution dataset which has token level annotations for LLM generations in the contextual question answering setup.
arXiv Detail & Related papers (2024-05-28T09:12:44Z) - Improving Attributed Text Generation of Large Language Models via Preference Learning [28.09715554543885]
We model the attribution task as preference learning and introduce an Automatic Preference Optimization framework.
APO achieves state-of-the-art citation F1 with higher answer quality.
arXiv Detail & Related papers (2024-03-27T09:19:13Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.