Learning to Refine with Fine-Grained Natural Language Feedback
- URL: http://arxiv.org/abs/2407.02397v2
- Date: Thu, 03 Oct 2024 18:55:17 GMT
- Title: Learning to Refine with Fine-Grained Natural Language Feedback
- Authors: Manya Wadhwa, Xinyu Zhao, Junyi Jessy Li, Greg Durrett,
- Abstract summary: We propose looking at refinement with feedback as a composition of three distinct LLM competencies.
A key property of the proposed Detect, Critique, Refine ("DCR") method is that the step 2 critique model can give fine-grained feedback about errors.
We show that models of different capabilities benefit from refining with DCR on the task of improving factual consistency of document grounded summaries.
- Score: 81.70313509881315
- License:
- Abstract: Recent work has explored the capability of large language models (LLMs) to identify and correct errors in LLM-generated responses. These refinement approaches frequently evaluate what sizes of models are able to do refinement for what problems, but less attention is paid to what effective feedback for refinement looks like. In this work, we propose looking at refinement with feedback as a composition of three distinct LLM competencies: (1) detection of bad generations; (2) fine-grained natural language critique generation; (3) refining with fine-grained feedback. The first step can be implemented with a high-performing discriminative model and steps 2 and 3 can be implemented either via prompted or fine-tuned LLMs. A key property of the proposed Detect, Critique, Refine ("DCR") method is that the step 2 critique model can give fine-grained feedback about errors, made possible by offloading the discrimination to a separate model in step 1. We show that models of different capabilities benefit from refining with DCR on the task of improving factual consistency of document grounded summaries. Overall, DCR consistently outperforms existing end-to-end refinement approaches and current trained models not fine-tuned for factuality critiquing.
Related papers
- RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques [59.861013614500024]
We introduce a new benchmark designed to assess the critique capabilities of Large Language Models (LLMs)
Unlike existing benchmarks, which typically function in an open-loop fashion, our approach employs a closed-loop methodology that evaluates the quality of corrections generated from critiques.
arXiv Detail & Related papers (2025-01-24T13:48:10Z) - Vision-Language Models Can Self-Improve Reasoning via Reflection [20.196406628954303]
Chain-of-thought (CoT) has proven to improve the reasoning capability of large language models (LLMs)
We propose a self-training framework, R3V, which iteratively enhances the model's Vision-language Reasoning by Reflecting on CoT Rationales.
Our approach supports self-reflection on generated solutions, further boosting performance through test-time computation.
arXiv Detail & Related papers (2024-10-30T14:45:00Z) - MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning [60.55556283848063]
Large Language Models' (LLM) reasoning can be improved using test-time aggregation strategies, i.e., generating multiple samples and voting among generated samples.
Refinement offers an alternative by using LLM-generated feedback to improve solution quality.
We propose MAgICoRe, which avoids excessive refinement by categorizing problem difficulty as easy or hard.
arXiv Detail & Related papers (2024-09-18T17:12:41Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - Large Language Models aren't all that you need [0.0]
This paper describes the architecture and systems built towards solving the SemEval 2023 Task 2: MultiCoNER II.
We evaluate two approaches (a) a traditional Random Fields model and (b) a Large Language Model (LLM) fine-tuned with a customized head and compare the two approaches.
arXiv Detail & Related papers (2024-01-01T08:32:50Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z) - N-Critics: Self-Refinement of Large Language Models with Ensemble of
Critics [5.516095889257118]
We propose a self-correction mechanism for Large Language Models (LLMs) to mitigate issues such as toxicity and fact hallucination.
This method involves refining model outputs through an ensemble of critics and the model's own feedback.
arXiv Detail & Related papers (2023-10-28T11:22:22Z) - Training Language Models with Language Feedback at Scale [50.70091340506957]
We introduce learning from Language Feedback (ILF), a new approach that utilizes more informative language feedback.
ILF consists of three steps that are applied iteratively: first, conditioning the language model on the input, an initial LM output, and feedback to generate refinements.
We show theoretically that ILF can be viewed as Bayesian Inference, similar to Reinforcement Learning from human feedback.
arXiv Detail & Related papers (2023-03-28T17:04:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.