Tailoring Self-Rationalizers with Multi-Reward Distillation
- URL: http://arxiv.org/abs/2311.02805v2
- Date: Wed, 22 May 2024 19:01:10 GMT
- Title: Tailoring Self-Rationalizers with Multi-Reward Distillation
- Authors: Sahana Ramnath, Brihi Joshi, Skyler Hallinan, Ximing Lu, Liunian Harold Li, Aaron Chan, Jack Hessel, Yejin Choi, Xiang Ren,
- Abstract summary: Large language models (LMs) are capable of generating free-text rationales to aid question answering.
In this work, we enable small-scale LMs to generate rationales that improve downstream task performance.
Our method, MaRio, is a multi-reward conditioned self-rationalization algorithm.
- Score: 88.95781098418993
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LMs) are capable of generating free-text rationales to aid question answering. However, prior work 1) suggests that useful self-rationalization is emergent only at significant scales (e.g., 175B parameter GPT-3); and 2) focuses largely on downstream performance, ignoring the semantics of the rationales themselves, e.g., are they faithful, true, and helpful for humans? In this work, we enable small-scale LMs (approx. 200x smaller than GPT-3) to generate rationales that not only improve downstream task performance, but are also more plausible, consistent, and diverse, assessed both by automatic and human evaluation. Our method, MaRio (Multi-rewArd RatIOnalization), is a multi-reward conditioned self-rationalization algorithm that optimizes multiple distinct properties like plausibility, diversity and consistency. Results on five difficult question-answering datasets StrategyQA, QuaRel, OpenBookQA, NumerSense and QASC show that not only does MaRio improve task accuracy, but it also improves the self-rationalization quality of small LMs across the aforementioned axes better than a supervised fine-tuning (SFT) baseline. Extensive human evaluations confirm that MaRio rationales are preferred vs. SFT rationales, as well as qualitative improvements in plausibility and consistency.
Related papers
- Vision-Language Models Can Self-Improve Reasoning via Reflection [20.196406628954303]
Chain-of-thought (CoT) has proven to improve the reasoning capability of large language models (LLMs)
We propose a self-training framework, R3V, which iteratively enhances the model's Vision-language Reasoning by Reflecting on CoT Rationales.
Our approach supports self-reflection on generated solutions, further boosting performance through test-time computation.
arXiv Detail & Related papers (2024-10-30T14:45:00Z) - Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models.
LoRA often under performs when compared to full- parameter fine-tuning.
We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z) - Is the MMI Criterion Necessary for Interpretability? Degenerating Non-causal Features to Plain Noise for Self-Rationalization [17.26418974819275]
This paper develops a new criterion that treats spurious features as plain noise.
Experiments show that our MRD criterion improves rationale quality (measured by the overlap with human-annotated rationales) by up to $10.4%$ as compared to several recent competitive MMI variants.
arXiv Detail & Related papers (2024-10-08T13:04:02Z) - CERET: Cost-Effective Extrinsic Refinement for Text Generation [14.43795791836198]
We propose CERET, a method for refining text generations by considering semantic stability, entailment and inter-sample uncertainty measures.
Experimental results show that CERET outperforms Self-consistency and Self-rerank baselines consistently under various task setups.
arXiv Detail & Related papers (2024-06-08T22:17:52Z) - Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought [51.240387516059535]
We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., 1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks.
We optimize the model through 1) knowledge distillation and 2) reinforcement learning from rationale-oriented and task-oriented reward signals.
arXiv Detail & Related papers (2024-04-04T12:46:37Z) - MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning [71.50432879573614]
Low-rank adaptation (LoRA) is based on the idea that the adaptation process is intrinsically low-dimensional.
We present MELoRA, a mini-ensemble low-rank adapters that uses fewer trainable parameters while maintaining a higher rank.
Our experimental results show that, compared to LoRA, MELoRA achieves better performance with 8 times fewer trainable parameters on natural language understanding tasks and 36 times fewer trainable parameters on instruction following tasks.
arXiv Detail & Related papers (2024-02-27T07:14:12Z) - ZARA: Improving Few-Shot Self-Rationalization for Small Language Models [29.755148112827502]
We present a novel approach, Zero-shot Augmentation of Rationale-Answer pairs (ZARA), to automatically construct pseudo-parallel data for self-training.
ZARA achieves SOTA performance on the FEB benchmark, for both the task accuracy and the explanation metric.
arXiv Detail & Related papers (2023-05-12T10:07:12Z) - Are Machine Rationales (Not) Useful to Humans? Measuring and Improving
Human Utility of Free-Text Rationales [62.02328001381361]
We show that human utility of existing rationales is far from satisfactory, and expensive to estimate with human studies.
We translate this finding into an automated score, GEN-U, that can help improve LMs' ability to generate rationales with better human utility.
arXiv Detail & Related papers (2023-05-11T19:01:13Z) - SCOTT: Self-Consistent Chain-of-Thought Distillation [68.40232422158569]
Large language models (LMs) generate free-text rationales for their predictions via chain-of-thought prompting.
We propose a faithful knowledge distillation method to learn a small, self-consistent CoT model from a teacher model that is orders of magnitude larger.
To ensure faithful distillation, we use the teacher-generated rationales to learn a student LM with a counterfactual reasoning objective.
arXiv Detail & Related papers (2023-05-03T03:47:00Z) - FRAME: Evaluating Simulatability Metrics for Free-Text Rationales [26.58948555913936]
Free-text rationales aim to explain neural language model (LM) behavior more flexibly and intuitively via natural language.
To ensure rationale quality, it is important to have metrics for measuring rationales' faithfulness and plausibility.
We propose FRAME, a framework for evaluating free-text rationale simulatability metrics.
arXiv Detail & Related papers (2022-07-02T09:25:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.