Self-Review Framework for Enhancing Instruction Following Capability of LLM
- URL: http://arxiv.org/abs/2507.05598v1
- Date: Tue, 08 Jul 2025 02:17:18 GMT
- Title: Self-Review Framework for Enhancing Instruction Following Capability of LLM
- Authors: Sihyun Park,
- Abstract summary: Re5 is a self-evaluation and revision framework designed to enhance instruction-following performance.<n>We show that Re5 achieves instruction-following performance comparable to models trained on data generated by GPT-4o-mini.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Various techniques have been proposed to improve large language models (LLMs) adherence to formatting and instruction constraints. One of the most effective approaches involves utilizing high-quality data generated by powerful models. However, such models often fail to fully comply with complex instructions in a single generation. To address this limitation, iterative revision methods have been introduced. Nevertheless, as the number of data points and revision iterations increases, the associated monetary costs grow significantly. As a resource-efficient alternative, methods have been proposed that leverage high-performance evaluation tools to compensate for the limited self-evaluation capabilities of open-source LLMs. However, these approaches often lead to a degradation in output quality due to excessive revision. To overcome these challenges, we propose Re5, a self-evaluation and revision framework designed to enhance instruction-following performance while preserving the quality of the generated content. Re5 extracts task and constraint components from user instructions, performs structural evaluations to prevent error accumulation, and applies fine-grained constraint-specific content evaluations followed by selective revisions. This process ensures precise and quality-preserving improvements. The final high-quality outputs are used for alignment tuning, enabling long-term alignment improvements through a data-centric iterative refinement loop. Experimental results demonstrate that Re5 achieves instruction-following performance comparable to models trained on data generated by GPT-4o-mini, a high-performance model, even with a small amount of data while maintaining response quality with a 64.24%-win rate over the non-revised initial responses. These results validate Re5 as an efficient and effective solution for enhancing instruction adherence with minimal external supervision.
Related papers
- Towards Efficient and Effective Alignment of Large Language Models [7.853945494882636]
Large language models (LLMs) exhibit remarkable capabilities across diverse tasks, yet aligning them efficiently and effectively with human expectations remains a critical challenge.<n>This thesis advances LLM alignment by introducing novel methodologies in data collection, training, and evaluation.
arXiv Detail & Related papers (2025-06-11T02:08:52Z) - Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval [49.669503570350166]
Generative information retrieval (GenIR) is a promising neural retrieval paradigm that formulates document retrieval as a document identifier (docid) generation task.<n>Existing GenIR models suffer from token-level misalignment, where models trained to predict the next token often fail to capture document-level relevance effectively.<n>We propose direct document relevance optimization (DDRO), which aligns token-level docid generation with document-level relevance estimation through direct optimization via pairwise ranking.
arXiv Detail & Related papers (2025-04-07T15:27:37Z) - Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.<n>However, improvement is plateauing due to the exhaustion of readily available high-quality data.<n>We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z) - Dynamic Noise Preference Optimization for LLM Self-Improvement via Synthetic Data [51.62162460809116]
We introduce Dynamic Noise Preference Optimization (DNPO) to ensure consistent improvements across iterations.<n>In experiments with Zephyr-7B, DNPO consistently outperforms existing methods, showing an average performance boost of 2.6%.<n> DNPO shows a significant improvement in model-generated data quality, with a 29.4% win-loss rate gap compared to the baseline in GPT-4 evaluations.
arXiv Detail & Related papers (2025-02-08T01:20:09Z) - Self-Refinement Strategies for LLM-based Product Attribute Value Extraction [51.45146101802871]
This paper investigates applying two self-refinement techniques to the product attribute value extraction task.<n>The experiments show that both self-refinement techniques fail to significantly improve the extraction performance while substantially increasing processing costs.<n>For scenarios with development data, fine-tuning yields the highest performance, while the ramp-up costs of fine-tuning are balanced out as the amount of product descriptions increases.
arXiv Detail & Related papers (2025-01-02T12:55:27Z) - Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark [62.58869921806019]
We propose a task decomposition evaluation framework based on GPT-4o to automatically construct a new training dataset.
We design innovative training strategies to effectively distill GPT-4o's evaluation capabilities into a 7B open-source MLLM, MiniCPM-V-2.6.
Experimental results demonstrate that our distilled open-source MLLM significantly outperforms the current state-of-the-art GPT-4o-base baseline.
arXiv Detail & Related papers (2024-11-23T08:06:06Z) - Improving Retrieval Augmented Language Model with Self-Reasoning [20.715106330314605]
We propose a novel self-reasoning framework aimed at improving the reliability and traceability of RALMs.<n>The framework involves constructing self-reason trajectories with three processes: a relevance-aware process, an evidence-aware selective process, and a trajectory analysis process.<n>We have evaluated our framework across four public datasets to demonstrate the superiority of our method.
arXiv Detail & Related papers (2024-07-29T09:05:10Z) - Don't Forget Your Reward Values: Language Model Alignment via
Value-based Calibration [26.467379188463028]
We propose a novel textbfValue-based textbfCalitextbfBration (VCB) method to better align Large Language Models with human preferences.
Experimental results demonstrate that VCB surpasses existing alignment methods on AI assistant and summarization datasets.
arXiv Detail & Related papers (2024-02-25T08:45:10Z) - Improving Language Models via Plug-and-Play Retrieval Feedback [42.786225163763376]
Large language models (LLMs) exhibit remarkable performance across various NLP tasks.
They often generate incorrect or hallucinated information, which hinders their practical applicability in real-world scenarios.
We introduce ReFeed, a novel pipeline designed to enhance LLMs by providing automatic retrieval feedback in a plug-and-play framework.
arXiv Detail & Related papers (2023-05-23T12:29:44Z) - A Self-Refinement Strategy for Noise Reduction in Grammatical Error
Correction [54.569707226277735]
Existing approaches for grammatical error correction (GEC) rely on supervised learning with manually created GEC datasets.
There is a non-negligible amount of "noise" where errors were inappropriately edited or left uncorrected.
We propose a self-refinement method where the key idea is to denoise these datasets by leveraging the prediction consistency of existing models.
arXiv Detail & Related papers (2020-10-07T04:45:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.