Related papers: Attribution, Citation, and Quotation: A Survey of Evidence-based Text Generation with Large Language Models

Attribution, Citation, and Quotation: A Survey of Evidence-based Text Generation with Large Language Models

URL: http://arxiv.org/abs/2508.15396v1
Date: Thu, 21 Aug 2025 09:36:35 GMT
Title: Attribution, Citation, and Quotation: A Survey of Evidence-based Text Generation with Large Language Models
Authors: Tobias Schreieder, Tim Schopf, Michael Färber,
Abstract summary: We introduce a unified taxonomy of evidence-based text generation with large language models.<n>We investigate 300 evaluation metrics across seven key dimensions.<n>We highlight open challenges and outline promising directions for future work.
Score: 9.664217498808338
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The increasing adoption of large language models (LLMs) has been accompanied by growing concerns regarding their reliability and trustworthiness. As a result, a growing body of research focuses on evidence-based text generation with LLMs, aiming to link model outputs to supporting evidence to ensure traceability and verifiability. However, the field is fragmented due to inconsistent terminology, isolated evaluation practices, and a lack of unified benchmarks. To bridge this gap, we systematically analyze 134 papers, introduce a unified taxonomy of evidence-based text generation with LLMs, and investigate 300 evaluation metrics across seven key dimensions. Thereby, we focus on approaches that use citations, attribution, or quotations for evidence-based text generation. Building on this, we examine the distinctive characteristics and representative methods in the field. Finally, we highlight open challenges and outline promising directions for future work.

Related papers

Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval [60.25608870901428]
Trustworthiness is a core research challenge for agentic AI systems built on Large Language Models (LLMs)<n>We propose the task of fact-checking without retrieval, focusing on the verification of arbitrary natural language claims, independent of their source robustness.
arXiv Detail & Related papers (2026-03-05T18:42:51Z)
Coordinated Semantic Alignment and Evidence Constraints for Retrieval-Augmented Generation with Large Language Models [4.023398871264227]
This paper proposes a retrieval augmented generation method that integrates semantic alignment with evidence constraints.<n>It improves factual reliability and verifiability while preserving natural language fluency.
arXiv Detail & Related papers (2026-03-04T22:21:04Z)
Measuring what Matters: Construct Validity in Large Language Model Benchmarks [103.53142193393931]
evaluating large language models (LLMs) is crucial for both assessing their capabilities and identifying safety or robustness issues prior to deployment.<n>We conduct a systematic review of 445 benchmarks from leading conferences in natural language processing and machine learning.<n>We find patterns related to the measured phenomena, tasks, and scoring metrics which undermine the validity of the resulting claims.
arXiv Detail & Related papers (2025-11-03T17:39:40Z)
TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification [32.958143806547234]
We introduce the Text pROVEnance (TROVE) challenge to trace each sentence of a target text back to specific source sentences.<n>To benchmark TROVE, we construct our dataset by leveraging three public datasets covering 11 diverse scenarios.<n>We evaluate 11 LLMs under direct prompting and retrieval-augmented paradigms.
arXiv Detail & Related papers (2025-03-19T15:09:39Z)
Unstructured Evidence Attribution for Long Context Query Focused Summarization [46.713307974729844]
Large language models (LLMs) are capable of generating coherent summaries from very long contexts given a user query.<n>We show how existing systems struggle to generate and properly cite unstructured evidence from their context.
arXiv Detail & Related papers (2025-02-20T09:57:42Z)
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling [63.98194996746229]
Large language models (LLMs) are prone to hallucination and producing factually incorrect information.<n>We propose a novel framework, called Think&Cite, and formulate attributed text generation as a multi-step reasoning problem integrated with search.
arXiv Detail & Related papers (2024-12-19T13:55:48Z)
A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document. Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative. Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z)
FarFetched: Entity-centric Reasoning and Claim Validation for the Greek Language based on Textually Represented Environments [0.3874856507026475]
We address the need for automated claim validation based on the aggregated evidence derived from multiple online news sources. We introduce an entity-centric reasoning framework in which latent connections between events, actions, or statements are revealed. Our approach tries to fill the gap in automated claim validation for less-resourced languages.
arXiv Detail & Related papers (2024-07-13T13:30:20Z)
Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z)
How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored. Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges. We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z)
Evaluation of Faithfulness Using the Longest Supported Subsequence [52.27522262537075]
We introduce a novel approach to evaluate faithfulness of machine-generated text by computing the longest noncontinuous of the claim that is supported by the context. Using a new human-annotated dataset, we finetune a model to generate Longest Supported Subsequence (LSS) Our proposed metric demonstrates an 18% enhancement over the prevailing state-of-the-art metric for faithfulness on our dataset.
arXiv Detail & Related papers (2023-08-23T14:18:44Z)
Enabling Large Language Models to Generate Text with Citations [37.64884969997378]
Large language models (LLMs) have emerged as a widely-used tool for information seeking. Our aim is to allow LLMs to generate text with citations, improving their factual correctness and verifiability. We propose ALCE, the first benchmark for Automatic LLMs' Citation Evaluation.
arXiv Detail & Related papers (2023-05-24T01:53:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.