Enhancing the Reasoning Capabilities of Small Language Models via Solution Guidance Fine-Tuning
- URL: http://arxiv.org/abs/2412.09906v1
- Date: Fri, 13 Dec 2024 06:45:26 GMT
- Title: Enhancing the Reasoning Capabilities of Small Language Models via Solution Guidance Fine-Tuning
- Authors: Jing Bi, Yuting Wu, Weiwei Xing, Zhenjie Wei,
- Abstract summary: This paper introduces Solution Guidance (SG) and a plug-and-play training paradigm Solution-Guidance Fine-Tuning (SGFT)
SG focuses on problem understanding and decomposition at the semantic and logical levels, rather than specific computations.
SGFT can fine-tune a SLM to produce accurate problem-solving guidances, which can be flexibly fed to any SLM as prompts.
- Score: 14.857842644246634
- License:
- Abstract: Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks. Advances in prompt engineering and fine-tuning techniques have further enhanced their ability to address complex reasoning challenges. However, these advanced capabilities are often exclusive to models exceeding 100 billion parameters. Although Chain-of-Thought (CoT) fine-tuning methods have been explored for smaller models (under 10 billion parameters), they typically depend on extensive CoT training data, which can introduce inconsistencies and limit effectiveness in low-data settings. To overcome these limitations, this paper introduce a new reasoning strategy Solution Guidance (SG) and a plug-and-play training paradigm Solution-Guidance Fine-Tuning (SGFT) for enhancing the reasoning capabilities of small language models. SG focuses on problem understanding and decomposition at the semantic and logical levels, rather than specific computations, which can effectively improve the SLMs' generalization and reasoning abilities. With only a small amount of SG training data, SGFT can fine-tune a SLM to produce accurate problem-solving guidances, which can then be flexibly fed to any SLM as prompts, enabling it to generate correct answers directly. Experimental results demonstrate that our method significantly improves the performance of SLMs on various reasoning tasks, enhancing both their practicality and efficiency within resource-constrained environments.
Related papers
- Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - Unified Parameter-Efficient Unlearning for LLMs [25.195126838721492]
Large Language Models (LLMs) have revolutionized natural language processing, enabling advanced understanding and reasoning capabilities across a variety of tasks.
This raises significant privacy and security concerns, as models may inadvertently retain and disseminate sensitive or undesirable information.
We introduce a novel instance-wise unlearning framework, LLMEraser, which systematically categorizes unlearning tasks and applies precise adjustments using influence functions.
arXiv Detail & Related papers (2024-11-30T07:21:02Z) - Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks [0.8425561594225592]
This study introduces a novel framework for training smaller language models in function calling.
It focuses on specific logical and mathematical reasoning tasks.
The approach aims to improve performances of small-scale models for these tasks using function calling.
arXiv Detail & Related papers (2024-10-24T16:27:35Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)
We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities.
When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge
Distillation in Small Models for Scientific QA [5.117094291273979]
Large Language Models (LLMs) have shown outstanding performance across wide range of downstream tasks.
We propose Sci-CoT, a two-stage framework that separates the processes of generating rationales and inferring answers.
Our 80-million parameter model is able to exceed the performance of BLOOM-176B in the ARC-Easy dataset under the few shot setting.
arXiv Detail & Related papers (2023-08-09T03:18:07Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.