Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training
- URL: http://arxiv.org/abs/2502.17800v1
- Date: Tue, 25 Feb 2025 03:03:35 GMT
- Title: Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training
- Authors: Yihang Yao, Zhepeng Cen, Miao Li, William Han, Yuyou Zhang, Emerson Liu, Zuxin Liu, Chuang Gan, Ding Zhao,
- Abstract summary: We propose syMmetry-ENhanceD (MEND) Data Augmentation, a data-centric approach that improves the model's ability to extract useful information from context.<n>Unlike existing methods that emphasize reasoning chain augmentation, our approach improves model robustness at the knowledge extraction stage.<n>Experiments on both logical and arithmetic reasoning tasks show that MEND enhances reasoning performance across diverse query variations.
- Score: 66.48331530995786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated strong reasoning capabilities across various tasks. However, even minor variations in query phrasing, despite preserving the underlying semantic meaning, can significantly affect their performance. To address this, we focus on enhancing LLMs' awareness of symmetry in query variations and propose syMmetry-ENhanceD (MEND) Data Augmentation, a data-centric approach that improves the model's ability to extract useful information from context. Unlike existing methods that emphasize reasoning chain augmentation, our approach improves model robustness at the knowledge extraction stage through query augmentations, enabling more data-efficient training and stronger generalization to Out-of-Distribution (OOD) settings. Extensive experiments on both logical and arithmetic reasoning tasks show that MEND enhances reasoning performance across diverse query variations, providing new insight into improving LLM robustness through structured dataset curation.
Related papers
- TRAIL: Joint Inference and Refinement of Knowledge Graphs with Large Language Models [5.678291291711662]
TRAIL is a novel, unified framework for Thinking, Reasoning, And Incremental Learning.<n>It couples joint inference and dynamic KG refinement with large language models.<n>Extensive experiments on multiple benchmarks demonstrate that TRAIL outperforms existing KG-augmented and retrieval-augmented LLM baselines by 3% to 13%.
arXiv Detail & Related papers (2025-08-06T14:25:05Z) - MeRF: Motivation-enhanced Reinforcement Finetuning for Large Reasoning Models [95.6332110724999]
Motivation-enhanced Reinforcement Finetuning (MeRF) is an intuitive yet effective method enhancing reinforcement learning of Large Language Models (LLMs)<n>MeRF directly injects the reward specification into the prompt, which serves as an in-context motivation for model to improve its responses with awareness of the optimization objective.<n> Empirical evaluations on the Knights and Knaves(K&K) logic puzzle reasoning benchmark demonstrate that textttMeRF achieves substantial performance gains over baselines.
arXiv Detail & Related papers (2025-06-23T10:37:57Z) - Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study [8.827173113748701]
We study character- and word-level edits of task-specific instructions, which substantially degrade downstream performance.
We find that, on average, self-denoising achieves substantially higher performance gains than alternative strategies.
arXiv Detail & Related papers (2025-04-03T16:17:56Z) - A Survey of Scaling in Large Language Model Reasoning [62.92861523305361]
We provide a comprehensive examination of scaling in large Language models (LLMs) reasoning.
We analyze scaling in reasoning steps that improves multi-step inference and logical consistency.
We discuss scaling in training-enabled reasoning, focusing on optimization through iterative model improvement.
arXiv Detail & Related papers (2025-04-02T23:51:27Z) - OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement [91.88062410741833]
This study investigates whether similar reasoning capabilities can be successfully integrated into large vision-language models (LVLMs)
We consider an approach that iteratively leverages supervised fine-tuning (SFT) on lightweight training data and Reinforcement Learning (RL) to further improve model generalization.
OpenVLThinker, a LVLM exhibiting consistently improved reasoning performance on challenging benchmarks such as MathVista, MathVerse, and MathVision, demonstrates the potential of our strategy for robust vision-language reasoning.
arXiv Detail & Related papers (2025-03-21T17:52:43Z) - Enhancing LLM Knowledge Learning through Generalization [73.16975077770765]
We show that an LLM's ability to continually predict the same factual knowledge tokens given diverse paraphrased contexts is positively correlated with its capacity to extract that knowledge via question-answering.<n>We propose two strategies to enhance LLMs' ability to predict the same knowledge tokens given varied contexts, thereby enhancing knowledge acquisition.
arXiv Detail & Related papers (2025-03-05T17:56:20Z) - LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications.
Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z) - Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented Approach [28.07366458452159]
Large Language Models (LLM) generate inconsistent and sometimes contradictory outputs when presented with a prompt that has equivalent semantics but is expressed differently from the original prompt.<n>To achieve semantic consistency of an LLM, one of the key approaches is to finetune the model with prompt-output pairs with semantically equivalent meanings.<n>We propose a more interpretable method (i.e., model editing) to enhance the semantic consistency of LLMs.
arXiv Detail & Related papers (2025-01-19T13:26:15Z) - Demonstration Selection for In-Context Learning via Reinforcement Learning [16.103533806505403]
Relevance-Diversity Enhanced Selection (RDES) is an innovative approach to optimize the selection of diverse reference demonstrations.<n>RDES employs frameworks like Q-learning and a PPO-based variant to dynamically identify demonstrations that maximize diversity.<n>We demonstrate that RDES significantly enhances performance compared to ten established baselines.
arXiv Detail & Related papers (2024-12-05T08:33:52Z) - The Role of Deductive and Inductive Reasoning in Large Language Models [37.430396755248104]
We propose the Deductive and InDuctive(DID) method to enhance Large Language Models (LLMs) reasoning.<n>DID implements a dual-metric complexity evaluation system that combines Littlestone dimension and information entropy.<n>Our results demonstrate significant improvements in reasoning quality and solution accuracy.
arXiv Detail & Related papers (2024-10-03T18:30:47Z) - MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct [148.39859547619156]
We propose MMEvol, a novel multimodal instruction data evolution framework.<n>MMEvol iteratively improves data quality through a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution.<n>Our approach reaches state-of-the-art (SOTA) performance in nine tasks using significantly less data compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-09T17:44:00Z) - LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named
Entity Recognition [67.96794382040547]
$LLM-DA$ is a novel data augmentation technique based on large language models (LLMs) for the few-shot NER task.
Our approach involves employing 14 contextual rewriting strategies, designing entity replacements of the same type, and incorporating noise injection to enhance robustness.
arXiv Detail & Related papers (2024-02-22T14:19:56Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z) - Post Hoc Explanations of Language Models Can Improve Language Models [43.2109029463221]
We present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY)
We leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions.
Our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks.
arXiv Detail & Related papers (2023-05-19T04:46:04Z) - Benchmarking Faithfulness: Towards Accurate Natural Language
Explanations in Vision-Language Tasks [0.0]
Natural language explanations (NLEs) promise to enable the communication of a model's decision-making in an easily intelligible way.
While current models successfully generate convincing explanations, it is an open question how well the NLEs actually represent the reasoning process of the models.
We propose three faithfulness metrics: Attribution-Similarity, NLE-Sufficiency, and NLE-Comprehensiveness.
arXiv Detail & Related papers (2023-04-03T08:24:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.