Related papers: Making Large Language Models Better Reasoners with Alignment

Making Large Language Models Better Reasoners with Alignment

URL: http://arxiv.org/abs/2309.02144v1
Date: Tue, 5 Sep 2023 11:32:48 GMT
Title: Making Large Language Models Better Reasoners with Alignment
Authors: Peiyi Wang and Lei Li and Liang Chen and Feifan Song and Binghuai Lin and Yunbo Cao and Tianyu Liu and Zhifang Sui
Abstract summary: Reasoning is a cognitive process of using evidence to reach a sound conclusion. Recent studies reveal that fine-tuning LLMs on data with the chain of thought (COT) reasoning process can significantly enhance their reasoning capabilities. We introduce an textitAlignment Fine-Tuning (AFT) paradigm, which involves three steps.
Score: 57.82176656663245
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reasoning is a cognitive process of using evidence to reach a sound conclusion. The reasoning capability is essential for large language models (LLMs) to serve as the brain of the artificial general intelligence agent. Recent studies reveal that fine-tuning LLMs on data with the chain of thought (COT) reasoning process can significantly enhance their reasoning capabilities. However, we find that the fine-tuned LLMs suffer from an \textit{Assessment Misalignment} problem, i.e., they frequently assign higher scores to subpar COTs, leading to potential limitations in their reasoning abilities. To address this problem, we introduce an \textit{Alignment Fine-Tuning (AFT)} paradigm, which involves three steps: 1) fine-tuning LLMs with COT training data; 2) generating multiple COT responses for each question, and categorizing them into positive and negative ones based on whether they achieve the correct answer; 3) calibrating the scores of positive and negative responses given by LLMs with a novel constraint alignment loss. Specifically, the constraint alignment loss has two objectives: a) Alignment, which guarantees that positive scores surpass negative scores to encourage answers with high-quality COTs; b) Constraint, which keeps the negative scores confined to a reasonable range to prevent the model degradation. Beyond just the binary positive and negative feedback, the constraint alignment loss can be seamlessly adapted to the ranking situations when ranking feedback is accessible. Furthermore, we also delve deeply into recent ranking-based alignment methods, such as DPO, RRHF, and PRO, and discover that the constraint, which has been overlooked by these approaches, is also crucial for their performance. Extensive experiments on four reasoning benchmarks with both binary and ranking feedback demonstrate the effectiveness of AFT.

Related papers

TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge [59.57934574562651]
TRACT (Two-stage Regression-Aware fine-tuning with CoT) is a method combining CoT reasoning with regression-aware training. Experiments across four LLM-as-a-judge datasets and two LLMs show that TRACT significantly outperforms existing methods.
arXiv Detail & Related papers (2025-03-06T12:33:20Z)
GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation [62.63014905981601]
Refusal-Aware Instruction Tuning (RAIT) aims to enhance Large Language Models (LLMs) by improving their ability to refuse responses to questions beyond their knowledge. Effective RAIT must address two key challenges: firstly, effectively reject unknown questions to minimize hallucinations; secondly, avoid over-refusal to ensure questions that can be correctly answered are not rejected. GraIT employs gradient-driven sample selection to effectively minimize hallucinations and (2) introduces an adaptive weighting mechanism during fine-tuning to reduce the risk of over-refusal.
arXiv Detail & Related papers (2025-02-09T14:11:30Z)
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning [60.60318625779015]
Hallucinations (i.e., generating plausible but inaccurate content) and laziness (i.e. excessive refusals or defaulting to "I don't know") persist as major challenges in LLM reasoning. Current efforts to reduce hallucinations primarily focus on factual errors in knowledge-grounded tasks, often neglecting hallucinations related to faulty reasoning. We propose Automatic Curriculum Expert Iteration (Auto-CEI) to enhance LLM reasoning and align responses to the model's capabilities.
arXiv Detail & Related papers (2024-10-10T05:43:07Z)
As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss [26.860139372768092]
We propose a novel alignment loss that establishes a stable Bidirectional Negative Feedback (BNF) during optimization. Our proposed BNF loss eliminates the need for pairwise contrastive losses. We conduct extensive experiments across two challenging QA benchmarks and four reasoning benchmarks.
arXiv Detail & Related papers (2024-10-07T08:44:04Z)
Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice Options [2.1184929769291294]
This study explores whether large language models (LLMs) prioritize following instructions over reasoning and truth when given "misleading" instructions. We introduce a new metric called "reflective judgment", which sheds new light on the relationship between the pre-training and post-training alignment schemes.
arXiv Detail & Related papers (2024-08-27T19:27:43Z)
Don't Say No: Jailbreaking LLM by Suppressing Refusal [13.666830169722576]
In this study, we first uncover the reason why vanilla target loss is not optimal, then we explore and enhance the loss objective and introduce the DSN (Don't Say No) attack. The existing evaluation such as refusal keyword matching reveals numerous false positive and false negative instances. To overcome this challenge, we propose an Ensemble Evaluation pipeline that novelly incorporates Natural Language Inference (NLI) contradiction assessment and two external LLM evaluators.
arXiv Detail & Related papers (2024-04-25T07:15:23Z)
Negating Negatives: Alignment with Human Negative Samples via Distributional Dispreference Optimization [37.8788435790632]
Large language models (LLMs) have revolutionized the role of AI, yet pose potential social risks. Existing methods rely on high-quality positive-negative training pairs, suffering from noisy positive responses that are barely distinguishable from negative ones. We propose Distributional Dispreference Optimization (D$2$O), which maximizes the discrepancy between dispreferred responses and the generated non-negative ones.
arXiv Detail & Related papers (2024-03-06T03:02:38Z)
Reasons to Reject? Aligning Language Models with Judgments [72.39858230784002]
We explore the use of language feedback to align large language models (LLMs) We propose Contrastive Unlikelihood Training (CUT) that allows for fine-grained inappropriate content detection and correction based on judgments. Our results show CUT can beat the 175B DaVinci003 and surpass the best baseline by 50.84 points on AlpacaEval.
arXiv Detail & Related papers (2023-12-22T10:29:43Z)
Fake Alignment: Are LLMs Really Aligned Well? [91.26543768665778]
This study investigates the substantial discrepancy in performance between multiple-choice questions and open-ended questions. Inspired by research on jailbreak attack patterns, we argue this is caused by mismatched generalization.
arXiv Detail & Related papers (2023-11-10T08:01:23Z)
Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment [105.34140537748546]
We propose an improved alignment approach named FIGA. Different from prior methods, we incorporate fine-grained quality signals that are derived by contrasting good and bad responses. Our approach has made two major contributions. Firstly, we curate a refined alignment dataset that pairs initial responses and the corresponding revised ones. Secondly, we devise a new loss function can leverage fine-grained quality signals to instruct the learning of LLMs for alignment.
arXiv Detail & Related papers (2023-11-07T15:36:40Z)
Preference Ranking Optimization for Human Alignment [90.6952059194946]
Large language models (LLMs) often contain misleading content, emphasizing the need to align them with human values. Reinforcement learning from human feedback (RLHF) has been employed to achieve this alignment. We propose Preference Ranking Optimization (PRO) as an efficient SFT algorithm to fine-tune LLMs for human alignment.
arXiv Detail & Related papers (2023-06-30T09:07:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.