Making Large Language Models Better Reasoners with Alignment
- URL: http://arxiv.org/abs/2309.02144v1
- Date: Tue, 5 Sep 2023 11:32:48 GMT
- Title: Making Large Language Models Better Reasoners with Alignment
- Authors: Peiyi Wang and Lei Li and Liang Chen and Feifan Song and Binghuai Lin
and Yunbo Cao and Tianyu Liu and Zhifang Sui
- Abstract summary: Reasoning is a cognitive process of using evidence to reach a sound conclusion.
Recent studies reveal that fine-tuning LLMs on data with the chain of thought (COT) reasoning process can significantly enhance their reasoning capabilities.
We introduce an textitAlignment Fine-Tuning (AFT) paradigm, which involves three steps.
- Score: 57.82176656663245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reasoning is a cognitive process of using evidence to reach a sound
conclusion. The reasoning capability is essential for large language models
(LLMs) to serve as the brain of the artificial general intelligence agent.
Recent studies reveal that fine-tuning LLMs on data with the chain of thought
(COT) reasoning process can significantly enhance their reasoning capabilities.
However, we find that the fine-tuned LLMs suffer from an \textit{Assessment
Misalignment} problem, i.e., they frequently assign higher scores to subpar
COTs, leading to potential limitations in their reasoning abilities. To address
this problem, we introduce an \textit{Alignment Fine-Tuning (AFT)} paradigm,
which involves three steps: 1) fine-tuning LLMs with COT training data; 2)
generating multiple COT responses for each question, and categorizing them into
positive and negative ones based on whether they achieve the correct answer; 3)
calibrating the scores of positive and negative responses given by LLMs with a
novel constraint alignment loss. Specifically, the constraint alignment loss
has two objectives: a) Alignment, which guarantees that positive scores surpass
negative scores to encourage answers with high-quality COTs; b) Constraint,
which keeps the negative scores confined to a reasonable range to prevent the
model degradation. Beyond just the binary positive and negative feedback, the
constraint alignment loss can be seamlessly adapted to the ranking situations
when ranking feedback is accessible. Furthermore, we also delve deeply into
recent ranking-based alignment methods, such as DPO, RRHF, and PRO, and
discover that the constraint, which has been overlooked by these approaches, is
also crucial for their performance. Extensive experiments on four reasoning
benchmarks with both binary and ranking feedback demonstrate the effectiveness
of AFT.
Related papers
- Automatic Curriculum Expert Iteration for Reliable LLM Reasoning [60.60318625779015]
Hallucinations (i.e., generating plausible but inaccurate content) and laziness (i.e. excessive refusals or defaulting to "I don't know") persist as major challenges in LLM reasoning.
Current efforts to reduce hallucinations primarily focus on factual errors in knowledge-grounded tasks, often neglecting hallucinations related to faulty reasoning.
We propose Automatic Curriculum Expert Iteration (Auto-CEI) to enhance LLM reasoning and align responses to the model's capabilities.
arXiv Detail & Related papers (2024-10-10T05:43:07Z) - As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss [26.860139372768092]
We propose a novel alignment loss that establishes a stable Bidirectional Negative Feedback (BNF) during optimization.
Our proposed BNF loss eliminates the need for pairwise contrastive losses.
We conduct extensive experiments across two challenging QA benchmarks and four reasoning benchmarks.
arXiv Detail & Related papers (2024-10-07T08:44:04Z) - Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice Options [2.1184929769291294]
This study explores whether large language models (LLMs) prioritize following instructions over reasoning and truth when given "misleading" instructions.
We introduce a new metric called "reflective judgment", which sheds new light on the relationship between the pre-training and post-training alignment schemes.
arXiv Detail & Related papers (2024-08-27T19:27:43Z) - Don't Say No: Jailbreaking LLM by Suppressing Refusal [13.666830169722576]
In this study, we first uncover the reason why vanilla target loss is not optimal, then we explore and enhance the loss objective and introduce the DSN (Don't Say No) attack.
The existing evaluation such as refusal keyword matching reveals numerous false positive and false negative instances.
To overcome this challenge, we propose an Ensemble Evaluation pipeline that novelly incorporates Natural Language Inference (NLI) contradiction assessment and two external LLM evaluators.
arXiv Detail & Related papers (2024-04-25T07:15:23Z) - Negating Negatives: Alignment with Human Negative Samples via Distributional Dispreference Optimization [37.8788435790632]
Large language models (LLMs) have revolutionized the role of AI, yet pose potential social risks.
Existing methods rely on high-quality positive-negative training pairs, suffering from noisy positive responses that are barely distinguishable from negative ones.
We propose Distributional Dispreference Optimization (D$2$O), which maximizes the discrepancy between dispreferred responses and the generated non-negative ones.
arXiv Detail & Related papers (2024-03-06T03:02:38Z) - Reasons to Reject? Aligning Language Models with Judgments [72.39858230784002]
We explore the use of language feedback to align large language models (LLMs)
We propose Contrastive Unlikelihood Training (CUT) that allows for fine-grained inappropriate content detection and correction based on judgments.
Our results show CUT can beat the 175B DaVinci003 and surpass the best baseline by 50.84 points on AlpacaEval.
arXiv Detail & Related papers (2023-12-22T10:29:43Z) - Fake Alignment: Are LLMs Really Aligned Well? [91.26543768665778]
This study investigates the substantial discrepancy in performance between multiple-choice questions and open-ended questions.
Inspired by research on jailbreak attack patterns, we argue this is caused by mismatched generalization.
arXiv Detail & Related papers (2023-11-10T08:01:23Z) - Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment [105.34140537748546]
We propose an improved alignment approach named FIGA. Different from prior methods, we incorporate fine-grained quality signals that are derived by contrasting good and bad responses.
Our approach has made two major contributions. Firstly, we curate a refined alignment dataset that pairs initial responses and the corresponding revised ones.
Secondly, we devise a new loss function can leverage fine-grained quality signals to instruct the learning of LLMs for alignment.
arXiv Detail & Related papers (2023-11-07T15:36:40Z) - Preference Ranking Optimization for Human Alignment [90.6952059194946]
Large language models (LLMs) often contain misleading content, emphasizing the need to align them with human values.
Reinforcement learning from human feedback (RLHF) has been employed to achieve this alignment.
We propose Preference Ranking Optimization (PRO) as an efficient SFT algorithm to fine-tune LLMs for human alignment.
arXiv Detail & Related papers (2023-06-30T09:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.