Improving the Robustness of Large Language Models via Consistency Alignment
- URL: http://arxiv.org/abs/2403.14221v2
- Date: Fri, 22 Mar 2024 12:34:47 GMT
- Title: Improving the Robustness of Large Language Models via Consistency Alignment
- Authors: Yukun Zhao, Lingyong Yan, Weiwei Sun, Guoliang Xing, Shuaiqiang Wang, Chong Meng, Zhicong Cheng, Zhaochun Ren, Dawei Yin,
- Abstract summary: Large language models (LLMs) have shown tremendous success in following user instructions and generating helpful responses.
LLMs may generate significantly inconsistent responses due to minor changes in the verbalized instructions.
We propose a two-stage training framework consisting of instruction-augmented supervised fine-tuning and consistency alignment training.
- Score: 36.24876571343749
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large language models (LLMs) have shown tremendous success in following user instructions and generating helpful responses. Nevertheless, their robustness is still far from optimal, as they may generate significantly inconsistent responses due to minor changes in the verbalized instructions. Recent literature has explored this inconsistency issue, highlighting the importance of continued improvement in the robustness of response generation. However, systematic analysis and solutions are still lacking. In this paper, we quantitatively define the inconsistency problem and propose a two-stage training framework consisting of instruction-augmented supervised fine-tuning and consistency alignment training. The first stage helps a model generalize on following instructions via similar instruction augmentations. In the second stage, we improve the diversity and help the model understand which responses are more aligned with human expectations by differentiating subtle differences in similar responses. The training process is accomplished by self-rewards inferred from the trained model at the first stage without referring to external human preference resources. We conduct extensive experiments on recent publicly available LLMs on instruction-following tasks and demonstrate the effectiveness of our training framework.
Related papers
- Your Pretrained Model Tells the Difficulty Itself: A Self-Adaptive Curriculum Learning Paradigm for Natural Language Understanding [53.63482987410292]
We present a self-adaptive curriculum learning paradigm that prioritizes fine-tuning examples based on difficulty scores predicted by pre-trained language models.<n>We evaluate our method on four natural language understanding (NLU) datasets covering both binary and multi-class classification tasks.
arXiv Detail & Related papers (2025-07-13T19:36:17Z) - Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning [43.12759195699103]
Large Language Models (LLMs) have achieved remarkable performance across various reasoning tasks, yet post-training is constrained by inefficient sample utilization and inflexible difficulty samples processing.<n>We propose Customized Curriculum Learning (CCL), a novel framework with two key innovations.<n>First, we introduce model-adaptive difficulty definition that customizes curriculum datasets based on each model's individual capabilities rather than using predefined difficulty metrics.<n>Second, we develop "Guided Prompting," which dynamically reduces sample difficulty through strategic hints, enabling effective utilization of challenging samples that would otherwise degrade performance.
arXiv Detail & Related papers (2025-06-04T15:31:46Z) - Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models [79.90523648823522]
Multi-stage continual learning can lead to catastrophic forgetting.<n>This paper evaluates three mitigation strategies-model merging, discounting the LoRA scaling factor, and experience replay.<n>Results show that experience replay is the most effective, with further gains achieved by combining it with other methods.
arXiv Detail & Related papers (2025-05-23T05:50:14Z) - Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning [69.64809103333839]
We investigate how explicitly modeling problem's difficulty prior information shapes the effectiveness of reinforcement learning based fine-tuning for multimodal reasoning.<n>Our approach demonstrates significant performances across various multi-modal mathematical reasoning benchmarks with only 2K+0.6K two-stage training data.
arXiv Detail & Related papers (2025-05-19T15:43:10Z) - Learning-to-Defer for Extractive Question Answering [3.6787328174619254]
We introduce an adapted two-stage Learning-to-Defer mechanism that enhances decision-making by enabling selective deference to human experts or larger models without retraining language models in the context of question-answering.
Our results demonstrate that deferring a minimal number of queries allows the smaller model to achieve performance comparable to their larger counterparts while preserving computing efficiency.
arXiv Detail & Related papers (2024-10-21T08:21:00Z) - Revealing the Inherent Instructability of Pre-Trained Language Models [9.504992236994697]
We show that Response Tuning (RT) removes the instruction and its corresponding mapping to the response from instruction tuning.
Our experiments demonstrate that RT, trained only on responses, can effectively respond to a wide range of instructions and exhibit helpfulness approaching that of their instruction-tuned counterparts.
arXiv Detail & Related papers (2024-10-03T13:15:19Z) - Recursive Introspection: Teaching Language Model Agents How to Self-Improve [30.086494067593268]
We develop RISE: Recursive IntroSpEction, an approach for fine-tuning large language models.
Our experiments show that RISE enables Llama2, Llama3, and Mistral models to improve themselves with more turns on math reasoning tasks.
arXiv Detail & Related papers (2024-07-25T17:35:59Z) - Progress or Regress? Self-Improvement Reversal in Post-training [26.051637877066327]
We propose a comprehensive evaluative framework to scrutinize the underlying enhancements of post-training paradigms for self-improvement.
We show that models showing improved performance across benchmarks will paradoxically exhibit declines in broader, essential capabilities.
These findings indicate that current self-improvement practices through post-training are inadequate for equipping models to tackle more complex problems.
arXiv Detail & Related papers (2024-07-06T09:07:11Z) - YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework.
It emulates the teacher-student education process to improve the efficacy of model fine-tuning.
Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z) - SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training
with Adversarial Remarks [47.609417223514605]
This work introduces the SAIE framework, which facilitates supportive and adversarial discussions between learner and partner models.
Our empirical evaluation shows that models fine-tuned with the SAIE framework outperform those trained with conventional fine-tuning approaches.
arXiv Detail & Related papers (2023-11-14T12:12:25Z) - Self-Convinced Prompting: Few-Shot Question Answering with Repeated
Introspection [13.608076739368949]
We introduce a novel framework that harnesses the potential of large-scale pre-trained language models.
Our framework processes the output of a typical few-shot chain-of-thought prompt, assesses the correctness of the response, scrutinizes the answer, and ultimately produces a new solution.
arXiv Detail & Related papers (2023-10-08T06:36:26Z) - Instruction-following Evaluation through Verbalizer Manipulation [64.73188776428799]
We propose a novel instruction-following evaluation protocol called verbalizer manipulation.
It instructs the model to verbalize the task label with words aligning with model priors to different extents.
We observe that the instruction-following abilities of models, across different families and scales, are significantly distinguished by their performance on less natural verbalizers.
arXiv Detail & Related papers (2023-07-20T03:54:24Z) - Entailment as Robust Self-Learner [14.86757876218415]
We design a prompting strategy that formulates a number of different NLU tasks as contextual entailment.
We propose the Simple Pseudo-Label Editing (SimPLE) algorithm for better pseudo-labeling quality in self-training.
arXiv Detail & Related papers (2023-05-26T18:41:23Z) - Enhancing Large Language Models Against Inductive Instructions with
Dual-critique Prompting [55.15697111170836]
This paper reveals the behaviors of large language models (LLMs) towards textitinductive instructions and enhance their truthfulness and helpfulness accordingly.
After extensive human and automatic evaluations, we uncovered a universal vulnerability among LLMs in processing inductive instructions.
We identify that different inductive styles affect the models' ability to identify the same underlying errors, and the complexity of the underlying assumptions also influences the model's performance.
arXiv Detail & Related papers (2023-05-23T06:38:20Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z) - Self-Paced Learning for Neural Machine Translation [55.41314278859938]
We propose self-paced learning for neural machine translation (NMT) training.
We show that the proposed model yields better performance than strong baselines.
arXiv Detail & Related papers (2020-10-09T11:33:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.