Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution
- URL: http://arxiv.org/abs/2503.01695v1
- Date: Mon, 03 Mar 2025 16:08:33 GMT
- Title: Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution
- Authors: Kun Li, Tianhua Zhang, Yunxiang Li, Hongyin Luo, Abdalla Moustafa, Xixin Wu, James Glass, Helen Meng,
- Abstract summary: GenDiE (Generate, Discriminate, Evolve) is a novel self-evolving framework that enhances context faithfulness through fine-grained sentence-level optimization.<n>By treating each sentence in a response as an independent optimization unit, GenDiE effectively addresses the limitations of previous approaches.<n>Experiments on ASQA (in-domain LFQA) and ConFiQA datasets demonstrate that GenDiE surpasses various baselines in both faithfulness and correctness.
- Score: 61.80716438091887
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Improving context faithfulness in large language models is essential for developing trustworthy retrieval augmented generation systems and mitigating hallucinations, especially in long-form question answering (LFQA) tasks or scenarios involving knowledge conflicts. Existing methods either intervene LLMs only at inference without addressing their inherent limitations or overlook the potential for self-improvement. In this paper, we introduce GenDiE (Generate, Discriminate, Evolve), a novel self-evolving framework that enhances context faithfulness through fine-grained sentence-level optimization. GenDiE combines both generative and discriminative training, equipping LLMs with self-generation and self-scoring capabilities to facilitate iterative self-evolution. This supports both data construction for model alignment and score-guided search during inference. Furthermore, by treating each sentence in a response as an independent optimization unit, GenDiE effectively addresses the limitations of previous approaches that optimize at the holistic answer level, which may miss unfaithful details. Experiments on ASQA (in-domain LFQA) and ConFiQA (out-of-domain counterfactual QA) datasets demonstrate that GenDiE surpasses various baselines in both faithfulness and correctness, and exhibits robust performance for domain adaptation.
Related papers
- A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.
These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.
This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms.
arXiv Detail & Related papers (2025-03-08T05:41:42Z) - Ext2Gen: Alignment through Unified Extraction and Generation for Robust Retrieval-Augmented Generation [18.570899885235104]
We propose Ext2Gen, a novel extract-then-generate model that enhances RAG by extracting query-relevant sentences before generating answers.
Experiments demonstrate that Ext2Gen effectively identifies query-relevant sentences with high precision and recall, leading to highly reliable answers.
arXiv Detail & Related papers (2025-02-28T06:46:53Z) - Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models.
We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models.
Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z) - Improving Contextual Faithfulness of Large Language Models via Retrieval Heads-Induced Optimization [35.269343563526675]
We propose RHIO, a framework to teach large language models to explicitly discriminate between faithful and unfaithful generations.
RHIO first augments unfaithful samples that simulate realistic model-intrinsic errors by selectively masking retrieval heads.
These samples are incorporated into joint training, enabling the model to distinguish unfaithful outputs from faithful ones conditioned on control tokens.
arXiv Detail & Related papers (2025-01-23T11:23:25Z) - First Token Probability Guided RAG for Telecom Question Answering [15.854941373238226]
Retrieval-Augmented Generation (RAG) has shown a distinct advantage in incorporating domain-specific information into Large Language Models (LLMs)<n>We propose a novel first token probability guided RAG framework to tackle the challenges of Multiple Choice Question Answering (MCQA) in telecommunications.
arXiv Detail & Related papers (2025-01-11T07:47:31Z) - Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via
Instruction Tuning with LITE [62.13435256279566]
Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks.
However, their large size makes their inference slow and computationally expensive.
We show that it enables these layers to acquire 'good' generation ability without affecting the generation ability of the final layer.
arXiv Detail & Related papers (2023-10-28T04:07:58Z) - Self-RAG: Learning to Retrieve, Generate, and Critique through
Self-Reflection [74.51523859064802]
We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG)
Self-RAG enhances an LM's quality and factuality through retrieval and self-reflection.
It significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks.
arXiv Detail & Related papers (2023-10-17T18:18:32Z) - SELF: Self-Evolution with Language Feedback [68.6673019284853]
'SELF' (Self-Evolution with Language Feedback) is a novel approach to advance large language models.
It enables LLMs to self-improve through self-reflection, akin to human learning processes.
Our experiments in mathematics and general tasks demonstrate that SELF can enhance the capabilities of LLMs without human intervention.
arXiv Detail & Related papers (2023-10-01T00:52:24Z) - Lightweight reranking for language model generations [26.942659041383596]
We present a novel approach for reranking Large Language Models (LLMs) generations.
Unlike other techniques that might involve additional inferences or training a specialized reranker, our approach relies on easy to compute pairwise statistics.
We show strong improvements for selecting the best k generations for code generation tasks as well as robust improvements for the best generation for the tasks of autoformalization, summarization, and translation.
arXiv Detail & Related papers (2023-07-11T17:51:48Z) - Diverse and Faithful Knowledge-Grounded Dialogue Generation via
Sequential Posterior Inference [82.28542500317445]
We present an end-to-end learning framework, termed Sequential Posterior Inference (SPI), capable of selecting knowledge and generating dialogues.
Unlike other methods, SPI does not require the inference network or assume a simple geometry of the posterior distribution.
arXiv Detail & Related papers (2023-06-01T21:23:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.