Patience Is The Key to Large Language Model Reasoning
- URL: http://arxiv.org/abs/2411.13082v2
- Date: Tue, 26 Nov 2024 10:57:42 GMT
- Title: Patience Is The Key to Large Language Model Reasoning
- Authors: Yijiong Yu,
- Abstract summary: We propose a simple method by encouraging models to adopt a more patient reasoning style.
We generate detailed reasoning processes as positive examples and simple answers as negative examples, thereby training the model to favor thoroughness in its responses.
Our results demonstrate a performance increase of up to 6.7% on GSM8k with training just on a lightweight dataset.
- Score: 0.0
- License:
- Abstract: Recent advancements in the field of large language models, particularly through the Chain of Thought (CoT) approach, have demonstrated significant improvements in solving complex problems. However, existing models either tend to sacrifice detailed reasoning for brevity due to user preferences, or require extensive and expensive training data to learn complicated reasoning ability, limiting their potential in solving complex tasks. To bridge this gap, following the concept of scaling test-time, we propose a simple method by encouraging models to adopt a more patient reasoning style without the need of introducing new knowledge or skills. To employ a preference optimization approach, we generate detailed reasoning processes as positive examples and simple answers as negative examples, thereby training the model to favor thoroughness in its responses. Our results demonstrate a performance increase of up to 6.7% on GSM8k with training just on a lightweight dataset.
Related papers
- Context-Aware Multimodal Pretraining [72.04020920042574]
We show that vision-language models can be trained to exhibit significantly increased few-shot adaptation.
We find up to four-fold improvements in test-time sample efficiency, and average few-shot adaptation gains of over 5%.
arXiv Detail & Related papers (2024-11-22T17:55:39Z) - Learning-to-Defer for Extractive Question Answering [3.6787328174619254]
We introduce an adapted two-stage Learning-to-Defer mechanism that enhances decision-making by enabling selective deference to human experts or larger models without retraining language models in the context of question-answering.
Our results demonstrate that deferring a minimal number of queries allows the smaller model to achieve performance comparable to their larger counterparts while preserving computing efficiency.
arXiv Detail & Related papers (2024-10-21T08:21:00Z) - Self-training Language Models for Arithmetic Reasoning [0.0]
We explore the potential of improving models' reasoning capabilities without new data.
We find that models can substantially improve in both single-round (offline) and online self-training.
arXiv Detail & Related papers (2024-07-11T11:06:05Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - Teaching Smaller Language Models To Generalise To Unseen Compositional
Questions [6.9076450524134145]
We propose a combination of multitask pretraining on up to 93 tasks designed to instill diverse reasoning abilities.
We show that performance can be significantly improved by adding retrieval-augmented training datasets.
arXiv Detail & Related papers (2023-08-02T05:00:12Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z) - Few-shot Prompting Towards Controllable Response Generation [49.479958672988566]
We first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters.
We apply multi-task learning to make the model learn to generalize to new tasks better.
Experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters.
arXiv Detail & Related papers (2022-06-08T14:48:06Z) - Building Accurate Simple Models with Multihop [13.182955266765653]
We propose a meta-approach where we transfer information from the complex model to the simple model.
Our approach can transfer information between consecutive models in the sequence using any of the previously mentioned approaches.
In the experiments on real data, we observe that we get consistent gains for different choices of models over 1-hop.
arXiv Detail & Related papers (2021-09-14T20:39:11Z) - End-to-end Neural Coreference Resolution Revisited: A Simple yet
Effective Baseline [20.431647446999996]
We propose a simple yet effective baseline for coreference resolution.
Our model is a simplified version of the original neural coreference resolution model.
Our work provides evidence for the necessity of carefully justifying the complexity of existing or newly proposed models.
arXiv Detail & Related papers (2021-07-04T18:12:24Z) - Enhancing Dialogue Generation via Multi-Level Contrastive Learning [57.005432249952406]
We propose a multi-level contrastive learning paradigm to model the fine-grained quality of the responses with respect to the query.
A Rank-aware (RC) network is designed to construct the multi-level contrastive optimization objectives.
We build a Knowledge Inference (KI) component to capture the keyword knowledge from the reference during training and exploit such information to encourage the generation of informative words.
arXiv Detail & Related papers (2020-09-19T02:41:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.