Related papers: Think Before You Attribute: Improving the Performance of LLMs Attribution Systems

Think Before You Attribute: Improving the Performance of LLMs Attribution Systems

URL: http://arxiv.org/abs/2505.12621v1
Date: Mon, 19 May 2025 02:08:20 GMT
Title: Think Before You Attribute: Improving the Performance of LLMs Attribution Systems
Authors: João Eduardo Batista, Emil Vatai, Mohamed Wahib,
Abstract summary: We propose a sentence-level pre-attribution step for Retrieve-Augmented Generation (RAG) systems.<n>By separating sentences before attribution, a proper attribution method can be selected for the type of sentence, or the attribution can be skipped altogether.
Score: 2.527698260421756
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly applied in various science domains, yet their broader adoption remains constrained by a critical challenge: the lack of trustworthy, verifiable outputs. Current LLMs often generate answers without reliable source attribution, or worse, with incorrect attributions, posing a barrier to their use in scientific and high-stakes settings, where traceability and accountability are non-negotiable. To be reliable, attribution systems need high accuracy and retrieve data with short lengths, i.e., attribute to a sentence within a document rather than a whole document. We propose a sentence-level pre-attribution step for Retrieve-Augmented Generation (RAG) systems that classify sentences into three categories: not attributable, attributable to a single quote, and attributable to multiple quotes. By separating sentences before attribution, a proper attribution method can be selected for the type of sentence, or the attribution can be skipped altogether. Our results indicate that classifiers are well-suited for this task. In this work, we propose a pre-attribution step to reduce the computational complexity of attribution, provide a clean version of the HAGRID dataset, and provide an end-to-end attribution system that works out of the box.

Related papers

LAQuer: Localized Attribution Queries in Content-grounded Generation [69.60308443863606]
Grounded text generation models often produce content that deviates from their source material, requiring user verification to ensure accuracy.<n>Existing attribution methods associate entire sentences with source documents, which can be overwhelming for users seeking to fact-check specific claims.<n>We introduce Localized Attribution Queries (LAQuer), a new task that localizes selected spans of generated output to their corresponding source spans, allowing fine-grained and user-directed attribution.
arXiv Detail & Related papers (2025-06-01T21:46:23Z)
Attribute or Abstain: Large Language Models as Long Document Assistants [58.32043134560244]
LLMs can help humans working with long documents, but are known to hallucinate. Existing approaches to attribution have only been evaluated in RAG settings, where the initial retrieval confounds LLM performance. This is crucially different from the long document setting, where retrieval is not needed, but could help. We present LAB, a benchmark of 6 diverse long document tasks with attribution, and experiments with different approaches to attribution on 5 LLMs of different sizes.
arXiv Detail & Related papers (2024-07-10T16:16:02Z)
WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations [34.99831757956635]
We formulate the task of attributed query-focused summarization (AQFS) and present WebCiteS, a Chinese dataset featuring 7k human-annotated summaries with citations. We tackle these issues by developing detailed metrics and enabling the automatic evaluator to decompose the sentences into sub-claims for fine-grained verification.
arXiv Detail & Related papers (2024-03-04T07:06:41Z)
Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation [28.89786334298637]
We develop a novel method to optimize LLMs using ranking metrics. Rather than a traditional full ordering, we advocate for a partial ordering. We test our system's improved response generation ability using benchmark datasets.
arXiv Detail & Related papers (2023-11-15T17:27:14Z)
SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window [24.524282909076767]
We present a metric named SLIDE (SLIding Document Evaluator) which operates on blocks of sentences. We find that SLIDE obtains significantly higher pairwise system accuracy than its sentence-level baseline. This suggests that source context may provide the same information as a human reference in disambiguating source ambiguities.
arXiv Detail & Related papers (2023-09-16T01:30:58Z)
RARR: Researching and Revising What Language Models Say, Using Language Models [31.057495176599502]
We propose RARR (Retrofit Attribution using Research and Revision), a system that automatically finds attribution for the output of any text generation model. We find that RARR significantly improves attribution while otherwise preserving the original input to a much greater degree than previously explored edit models.
arXiv Detail & Related papers (2022-10-17T03:44:30Z)
Factual Error Correction for Abstractive Summaries Using Entity Retrieval [57.01193722520597]
We propose an efficient factual error correction system RFEC based on entities retrieval post-editing process. RFEC retrieves the evidence sentences from the original document by comparing the sentences with the target summary. Next, RFEC detects the entity-level errors in the summaries by considering the evidence sentences and substitutes the wrong entities with the accurate entities from the evidence sentences.
arXiv Detail & Related papers (2022-04-18T11:35:02Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms. Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time. Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z)
Context-Based Quotation Recommendation [60.93257124507105]
We propose a novel context-aware quote recommendation system. It generates a ranked list of quotable paragraphs and spans of tokens from a given source document. We conduct experiments on a collection of speech transcripts and associated news articles.
arXiv Detail & Related papers (2020-05-17T17:49:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.