Related papers: Improving the Robustness of Large Language Models via Consistency Alignment

Improving the Robustness of Large Language Models via Consistency Alignment

URL: http://arxiv.org/abs/2403.14221v2
Date: Fri, 22 Mar 2024 12:34:47 GMT
Title: Improving the Robustness of Large Language Models via Consistency Alignment
Authors: Yukun Zhao, Lingyong Yan, Weiwei Sun, Guoliang Xing, Shuaiqiang Wang, Chong Meng, Zhicong Cheng, Zhaochun Ren, Dawei Yin,
Abstract summary: Large language models (LLMs) have shown tremendous success in following user instructions and generating helpful responses. LLMs may generate significantly inconsistent responses due to minor changes in the verbalized instructions. We propose a two-stage training framework consisting of instruction-augmented supervised fine-tuning and consistency alignment training.
Score: 36.24876571343749
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large language models (LLMs) have shown tremendous success in following user instructions and generating helpful responses. Nevertheless, their robustness is still far from optimal, as they may generate significantly inconsistent responses due to minor changes in the verbalized instructions. Recent literature has explored this inconsistency issue, highlighting the importance of continued improvement in the robustness of response generation. However, systematic analysis and solutions are still lacking. In this paper, we quantitatively define the inconsistency problem and propose a two-stage training framework consisting of instruction-augmented supervised fine-tuning and consistency alignment training. The first stage helps a model generalize on following instructions via similar instruction augmentations. In the second stage, we improve the diversity and help the model understand which responses are more aligned with human expectations by differentiating subtle differences in similar responses. The training process is accomplished by self-rewards inferred from the trained model at the first stage without referring to external human preference resources. We conduct extensive experiments on recent publicly available LLMs on instruction-following tasks and demonstrate the effectiveness of our training framework.

Related papers

Learning-to-Defer for Extractive Question Answering [3.6787328174619254]
We introduce an adapted two-stage Learning-to-Defer mechanism that enhances decision-making by enabling selective deference to human experts or larger models without retraining language models in the context of question-answering. Our results demonstrate that deferring a minimal number of queries allows the smaller model to achieve performance comparable to their larger counterparts while preserving computing efficiency.
arXiv Detail & Related papers (2024-10-21T08:21:00Z)
Revealing the Inherent Instructability of Pre-Trained Language Models [9.504992236994697]
We show that Response Tuning (RT) removes the instruction and its corresponding mapping to the response from instruction tuning. Our experiments demonstrate that RT, trained only on responses, can effectively respond to a wide range of instructions and exhibit helpfulness approaching that of their instruction-tuned counterparts.
arXiv Detail & Related papers (2024-10-03T13:15:19Z)
Recursive Introspection: Teaching Language Model Agents How to Self-Improve [30.086494067593268]
We develop RISE: Recursive IntroSpEction, an approach for fine-tuning large language models. Our experiments show that RISE enables Llama2, Llama3, and Mistral models to improve themselves with more turns on math reasoning tasks.
arXiv Detail & Related papers (2024-07-25T17:35:59Z)
Progress or Regress? Self-Improvement Reversal in Post-training [26.051637877066327]
We propose a comprehensive evaluative framework to scrutinize the underlying enhancements of post-training paradigms for self-improvement. We show that models showing improved performance across benchmarks will paradoxically exhibit declines in broader, essential capabilities. These findings indicate that current self-improvement practices through post-training are inadequate for equipping models to tackle more complex problems.
arXiv Detail & Related papers (2024-07-06T09:07:11Z)
YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework. It emulates the teacher-student education process to improve the efficacy of model fine-tuning. Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z)
SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks [47.609417223514605]
This work introduces the SAIE framework, which facilitates supportive and adversarial discussions between learner and partner models. Our empirical evaluation shows that models fine-tuned with the SAIE framework outperform those trained with conventional fine-tuning approaches.
arXiv Detail & Related papers (2023-11-14T12:12:25Z)
Instruction-following Evaluation through Verbalizer Manipulation [64.73188776428799]
We propose a novel instruction-following evaluation protocol called verbalizer manipulation. It instructs the model to verbalize the task label with words aligning with model priors to different extents. We observe that the instruction-following abilities of models, across different families and scales, are significantly distinguished by their performance on less natural verbalizers.
arXiv Detail & Related papers (2023-07-20T03:54:24Z)
Entailment as Robust Self-Learner [14.86757876218415]
We design a prompting strategy that formulates a number of different NLU tasks as contextual entailment. We propose the Simple Pseudo-Label Editing (SimPLE) algorithm for better pseudo-labeling quality in self-training.
arXiv Detail & Related papers (2023-05-26T18:41:23Z)
Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats. We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes. We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z)
Self-Paced Learning for Neural Machine Translation [55.41314278859938]
We propose self-paced learning for neural machine translation (NMT) training. We show that the proposed model yields better performance than strong baselines.
arXiv Detail & Related papers (2020-10-09T11:33:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.