Related papers: Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought

Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought

URL: http://arxiv.org/abs/2404.03414v1
Date: Thu, 4 Apr 2024 12:46:37 GMT
Title: Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought
Authors: Jooyoung Lee, Fan Yang, Thanh Tran, Qian Hu, Emre Barut, Kai-Wei Chang, Chengwei Su,
Abstract summary: We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., 1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks. We optimize the model through 1) knowledge distillation and 2) reinforcement learning from rationale-oriented and task-oriented reward signals.
Score: 51.240387516059535
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., <1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks. Specifically, the lightweight LM first generates a rationale for each input instance. The Frozen large LM is then prompted to predict a task output based on the rationale generated by the lightweight LM. Our approach is resource-efficient in the sense that it only requires training the lightweight LM. We optimize the model through 1) knowledge distillation and 2) reinforcement learning from rationale-oriented and task-oriented reward signals. We assess our method with multi-hop extractive question answering (QA) benchmarks, HotpotQA, and 2WikiMultiHopQA. Experimental results show that our approach outperforms all baselines regarding answer prediction accuracy. We also find that reinforcement learning helps the model to produce higher-quality rationales with improved QA performance.

Related papers

Interleaved Reasoning for Large Language Models via Reinforcement Learning [22.403928213802036]
Long chain-of-thought (CoT) enhances large language models' (LLM) reasoning capabilities.<n>We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions.
arXiv Detail & Related papers (2025-05-26T07:58:17Z)
ExTrans: Multilingual Deep Reasoning Translation via Exemplar-Enhanced Reinforcement Learning [77.41383117199227]
We design a new reward modeling method that compares the translation results of the policy MT model with a strong LRM.<n>Using Qwen2.5-7B-Instruct as the backbone, the trained model achieves the new state-of-the-art performance in literary translation.<n>We extend our method to the multilingual settings with 11 languages.
arXiv Detail & Related papers (2025-05-19T11:34:47Z)
On the Emergence of Thinking in LLMs I: Searching for the Right Intuition [34.32871896067864]
We propose a post-training framework called Reinforcement Learning via Self-Play (RLSP) RLSP involves three steps: supervised fine-tuning with human or synthetic demonstrations of the reasoning process, using an exploration reward signal to encourage diverse and efficient reasoning behaviors, and RL training with an outcome verifier to ensure correctness while preventing reward hacking. Empirical studies in the math domain show that RLSP improves reasoning.
arXiv Detail & Related papers (2025-02-10T18:52:04Z)
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains. Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities. We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z)
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling [52.34735382627312]
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. Existing approaches mainly rely on imitation learning and struggle to achieve effective test-time scaling. We present T1 to scale reinforcement learning by encouraging exploration and understand inference scaling.
arXiv Detail & Related papers (2025-01-20T18:33:33Z)
Prompting Large Language Models with Rationale Heuristics for Knowledge-based Visual Question Answering [6.745948705869626]
We argue that prior methods do not sufficiently activate the capacities of Large Language Models (LLMs) We propose a framework called PLRH that Prompts LLMs with Rationale Heuristics for knowledge-based VQA.
arXiv Detail & Related papers (2024-12-22T09:14:35Z)
Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning [5.487210426671288]
In this work, we demonstrate that the reasoning abilities of small-scale LMs can be enhanced through self-training. We also show that the conventional self-training can be further augmented by a preference learning algorithm called Direct Preference Optimization.
arXiv Detail & Related papers (2024-07-25T17:59:16Z)
CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation [76.31621715032558]
Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses. We introduce CaLM, a novel verification framework. Our framework empowers smaller LMs, which rely less on parametric memory, to validate the output of larger LMs.
arXiv Detail & Related papers (2024-06-08T06:04:55Z)
BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models [2.2863439039616127]
Probing assesses to which degree a language model (LM) has successfully learned relational knowledge during pre-training. Previous approaches rely on the objective function used in pre-training LMs. We propose an approach that uses an LM's inherent ability to estimate the log-likelihood of any given textual statement.
arXiv Detail & Related papers (2024-04-05T14:13:55Z)
Teaching Large Language Models to Reason with Reinforcement Learning [38.17625148525193]
Reinforcement Learning from Human Feedback (textbfRLHF) has emerged as a dominant approach for aligning LLM outputs with human preferences. Inspired by the success of RLHF, we study the performance of multiple algorithms that learn from feedback.
arXiv Detail & Related papers (2024-03-07T16:36:29Z)
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models [107.07851578154242]
Language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities. It is unclear whether LMs perform tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism. We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples.
arXiv Detail & Related papers (2023-10-23T01:47:29Z)
LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text. This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion. We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z)
SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs) We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer. We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z)
Prompting as Probing: Using Language Models for Knowledge Base Construction [1.6050172226234583]
We present ProP (Prompting as Probing), which utilizes GPT-3, a large Language Model originally proposed by OpenAI in 2020. ProP implements a multi-step approach that combines a variety of prompting techniques to achieve this. Our evaluation study indicates that these proposed techniques can substantially enhance the quality of the final predictions.
arXiv Detail & Related papers (2022-08-23T16:03:50Z)
Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM) We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior. Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.