Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
- URL: http://arxiv.org/abs/2405.06424v2
- Date: Sun, 19 May 2024 17:35:43 GMT
- Title: Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
- Authors: JoonHo Lee, Jae Oh Woo, Juree Seok, Parisa Hassanzadeh, Wooseok Jang, JuYoun Son, Sima Didari, Baruch Gutow, Heng Hao, Hankyu Moon, Wenjun Hu, Yeong-Dae Kwon, Taehee Lee, Seungjai Min,
- Abstract summary: We propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for the quality of paired responses.
Empirical results demonstrate significant benefits of incorporating the proposed proxy into language model training.
- Score: 12.921225188504643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for the quality of paired responses based on Bayesian approximation. Trained with preference datasets, our uncertainty-enabled proxy not only scores rewards for responses but also evaluates their inherent uncertainty. Empirical results demonstrate significant benefits of incorporating the proposed proxy into language model training. Our method boosts the instruction following capability of language models by refining data curation for training and improving policy optimization objectives, thereby surpassing existing methods by a large margin on benchmarks such as Vicuna and MT-bench. These findings highlight that our proposed approach substantially advances language model training and paves a new way of harnessing uncertainty within language models.
Related papers
- Learning-to-Defer for Extractive Question Answering [3.6787328174619254]
We introduce an adapted two-stage Learning-to-Defer mechanism that enhances decision-making by enabling selective deference to human experts or larger models without retraining language models in the context of question-answering.
Our results demonstrate that deferring a minimal number of queries allows the smaller model to achieve performance comparable to their larger counterparts while preserving computing efficiency.
arXiv Detail & Related papers (2024-10-21T08:21:00Z) - Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models [6.394084132117747]
We propose a technique that leverages counterfactual generation to evaluate the faithfulness of attribution methods for autoregressive language models.
Our technique generates fluent, in-distribution counterfactuals, making the evaluation protocol more reliable.
arXiv Detail & Related papers (2024-08-21T00:17:59Z) - Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models [96.43562963756975]
We train a regression model, which target variable is the gap between the conditional and the unconditional generation confidence.
We use this learned conditional dependency model to modulate the uncertainty of the current generation step based on the uncertainty of the previous step.
arXiv Detail & Related papers (2024-08-20T09:42:26Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of
Pre-trained Language Models with Proximal Policy Optimization [18.75866961339424]
ChatGPT has highlighted the potential of reinforcement learning from human feedback.
To reduce labor costs, we propose a self-supervised text ranking approach.
arXiv Detail & Related papers (2024-02-28T12:24:07Z) - Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model [75.66013048128302]
In this work, we investigate the potential of employing the QE model as the reward model to predict human preferences for feedback training.
We first identify the overoptimization problem during QE-based feedback training, manifested as an increase in reward while translation quality declines.
To address the problem, we adopt a simple yet effective method that uses rules to detect the incorrect translations and assigns a penalty term to the reward scores of them.
arXiv Detail & Related papers (2024-01-23T16:07:43Z) - Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z) - Ask Again, Then Fail: Large Language Models' Vacillations in Judgment [28.74246375289661]
We observe that current conversational language models often waver in their judgments when faced with follow-up questions.
We introduce a textscFollow-up Questioning Mechanism along with two metrics to quantify this inconsistency.
We develop a training-based framework textscUnwavering-FQ that teaches language models to maintain their originally correct judgments.
arXiv Detail & Related papers (2023-10-03T16:08:41Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Boosting Cross-Lingual Transfer via Self-Learning with Uncertainty
Estimation [34.97086123805344]
Recent multilingual pre-trained language models have achieved remarkable zero-shot performance.
We propose a self-learning framework that further utilizes unlabeled data of target languages.
We evaluate our framework with uncertainties on two cross-lingual tasks including Named Entity Recognition (NER) and Natural Language Inference (NLI) covering 40 languages in total.
arXiv Detail & Related papers (2021-09-01T05:26:46Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.