Replacing thinking with tool usage enables reasoning in small language models
- URL: http://arxiv.org/abs/2507.05065v1
- Date: Mon, 07 Jul 2025 14:49:18 GMT
- Title: Replacing thinking with tool usage enables reasoning in small language models
- Authors: Corrado Rainone, Tim Bakker, Roland Memisevic,
- Abstract summary: Recent advances have established a new machine learning paradigm based on scaling up compute at inference time as well as at training time.<n>In this paper, we propose to format these tokens as a multi-turn interaction trace with a stateful tool.<n>At each turn, the new state of the tool is appended to the context of the model, whose job is to generate the tokens necessary to control the tool via a custom DSL.
- Score: 2.357055571094446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances have established a new machine learning paradigm based on scaling up compute at inference time as well as at training time. In that line of work, a combination of Supervised Fine-Tuning (SFT) on synthetic demonstrations and Reinforcement Learning with Verifiable Rewards (RLVR) is used for training Large Language Models to expend extra compute during inference in the form of "thoughts" expressed in natural language. In this paper, we propose to instead format these tokens as a multi-turn interaction trace with a stateful tool. At each turn, the new state of the tool is appended to the context of the model, whose job is to generate the tokens necessary to control the tool via a custom DSL. We benchmark this approach on the problem of repairing malfunctioning Python code, and show that this constrained setup allows for faster sampling of experience and a denser reward signal, allowing even models of size up to 3B parameters to learn how to proficiently expend additional compute on the task.
Related papers
- NNTile: a machine learning framework capable of training extremely large GPT language models on a single node [83.9328245724548]
NNTile is based on a StarPU library, which implements task-based parallelism and schedules all provided tasks onto all available processing units.<n>It means that a particular operation, necessary to train a large neural network, can be performed on any of the CPU cores or GPU devices.
arXiv Detail & Related papers (2025-04-17T16:22:32Z) - Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt [7.096646842716599]
We introduce language hooks, a novel framework for augmenting language models with new capabilities.<n>We benchmark our method against state-of-the-art baselines, find that it outperforms task-aware approaches.
arXiv Detail & Related papers (2024-12-08T15:16:17Z) - Training of Scaffolded Language Models with Language Supervision: A Survey [62.59629932720519]
This survey organizes the literature on the design and optimization of emerging structures around post-trained LMs.<n>We refer to this overarching structure as scaffolded LMs and focus on LMs that are integrated into multi-step processes with tools.
arXiv Detail & Related papers (2024-10-21T18:06:25Z) - A Study on the Calibration of In-context Learning [27.533223818505682]
We study in-context learning (ICL), a prevalent method for adapting static language models through tailored prompts.
We observe that, with an increasing number of ICL examples, models initially exhibit increased miscalibration before achieving better calibration.
We explore recalibration techniques and find that a scaling-binning calibrator can reduce calibration errors consistently.
arXiv Detail & Related papers (2023-12-07T03:37:39Z) - Adaptive Gating in Mixture-of-Experts based Language Models [7.936874532105228]
Sparsely activated mixture-of-experts (MoE) has emerged as a promising solution for scaling models.
This paper introduces adaptive gating in MoE, a flexible training strategy that allows tokens to be processed by a variable number of experts.
arXiv Detail & Related papers (2023-10-11T04:30:18Z) - Arithmetic with Language Models: from Memorization to Computation [3.077668143048211]
This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data.
We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing.
arXiv Detail & Related papers (2023-08-02T13:58:37Z) - A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained
Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings.
We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z) - Few-shot Prompting Towards Controllable Response Generation [49.479958672988566]
We first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters.
We apply multi-task learning to make the model learn to generalize to new tasks better.
Experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters.
arXiv Detail & Related papers (2022-06-08T14:48:06Z) - Fine-grained Multi-Modal Self-Supervised Learning [4.850800439026724]
Multi-Modal Self-Supervised Learning from videos has been shown to improve model's performance on various downstream tasks.
Such pre-training requires large batch sizes and a large amount of computation resources due to the noise present in uncurated data.
We propose a fine-grained multi-modal self-supervised training scheme that computes the similarity between embeddings at finer-scale.
arXiv Detail & Related papers (2021-12-22T19:17:45Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs.
We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit"
We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.