Related papers: A Technical Study into 0.5B Reasoning Language Models

A Technical Study into 0.5B Reasoning Language Models

URL: http://arxiv.org/abs/2506.13404v2
Date: Fri, 20 Jun 2025 16:50:22 GMT
Title: A Technical Study into 0.5B Reasoning Language Models
Authors: Xialie Zhuang, Peixian Ma, Zhikai Jia, Shiwei Liu, Zheng Cao,
Abstract summary: Small Reasoning Language Models (SRLMs) with approximately 0.5 billion parameters present a compelling alternative due to their remarkable computational efficiency and cost effectiveness.<n>This research investigates various training strategies, including supervised fine-tuning (SFT), knowledge distillation (KD), and reinforcement learning (RL), to enhance the performance of 0.5B SRLMs.
Score: 20.004980571905463
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The ongoing evolution of language models has led to the development of large-scale architectures that demonstrate exceptional performance across a wide range of tasks. However, these models come with significant computational and energy demands, as well as potential privacy implications. In this context, Small Reasoning Language Models (SRLMs) with approximately 0.5 billion parameters present a compelling alternative due to their remarkable computational efficiency and cost effectiveness, particularly in resource-constrained environments. Despite these advantages, the limited capacity of 0.5 billion parameter models poses challenges in handling complex tasks such as mathematical reasoning and code generation. This research investigates various training strategies, including supervised fine-tuning (SFT), knowledge distillation (KD), and reinforcement learning (RL), as well as their hybrid implementations, to enhance the performance of 0.5B SRLMs. We analyze effective methodologies to bridge the performance gap between SRLMS and larger models and present insights into optimal training pipelines tailored for these smaller architectures. Through extensive experimental validation and analysis, our work aims to provide actionable recommendations for maximizing the reasoning capabilities of 0.5B models.

Related papers

Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies [66.83950068218033]
Scaling Laws demonstrate that scaling model parameters and training data enhances learning performance.<n>Despite its potential to improve performance, the integration of scaling laws into deep reinforcement learning has not been fully realized.<n>This review addresses this gap by systematically analyzing scaling strategies in three dimensions: data, network, and training budget.
arXiv Detail & Related papers (2025-08-05T08:03:12Z)
Are Large Brainwave Foundation Models Capable Yet? Insights from Fine-tuning [41.40603531008809]
We evaluate current Large Brainwave Foundation Models (LBMs) through systematic fine-tuning experiments.<n>Our analysis shows that state-of-the-art LBMs achieve only marginal improvements (0.9%-1.2%) over traditional deep architectures.
arXiv Detail & Related papers (2025-07-01T21:21:42Z)
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning [17.421901873720156]
This paper proposes a novel RL framework called textbfVision-EKIPL.<n>It introduces high-quality actions generated by external auxiliary models during the RL training process to guide the optimization of the policy model.<n>It achieves up to a 5% performance improvement on the Reason-RFT-CoT Benchmark compared to the state-of-the-art (SOTA)
arXiv Detail & Related papers (2025-06-07T16:37:46Z)
How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study [16.441081996257576]
This paper presents a rigorous experimental investigation into how difficulty-aware staged reinforcement learning strategies can substantially improve reasoning performance.<n>We show that strategically selecting training data according to well-defined difficulty levels markedly enhances RL optimization.<n>We will open-source our datasets on GitHub and Hugging Face.
arXiv Detail & Related papers (2025-04-01T14:18:38Z)
Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability [1.542607498220242]
This research focuses on the systematic evaluation of individual weight importance throughout the training process.<n>We propose a method that effectively reduces model size without compromising performance.<n>These findings highlight the critical need for optimized AI models to ensure sustainable development.
arXiv Detail & Related papers (2025-02-24T11:34:49Z)
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.<n>We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.<n>Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z)
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective [82.9413277326097]
Chain-of-Reasoning (CoR) is a novel unified framework integrating multiple reasoning paradigms.<n>CoR generates multiple potential answers via different reasoning paradigms and synthesizes them into a coherent final solution.<n> Experimental results demonstrate that CoR-Math-7B significantly outperforms current SOTA models.
arXiv Detail & Related papers (2025-01-19T16:53:26Z)
Synergistic Development of Perovskite Memristors and Algorithms for Robust Analog Computing [53.77822620185878]
We propose a synergistic methodology to concurrently optimize perovskite memristor fabrication and develop robust analog DNNs.<n>We develop "BayesMulti", a training strategy utilizing BO-guided noise injection to improve the resistance of analog DNNs to memristor imperfections.<n>Our integrated approach enables use of analog computing in much deeper and wider networks, achieving up to 100-fold improvements.
arXiv Detail & Related papers (2024-12-03T19:20:08Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
Do Generative Large Language Models need billions of parameters? [0.0]
The research explores novel methods that allow different parts of the model to share parameters. This approach ensures that the model remains compact without sacrificing its ability to learn and represent complex language structures.
arXiv Detail & Related papers (2023-09-12T20:25:22Z)
Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA [5.117094291273979]
Large Language Models (LLMs) have shown outstanding performance across wide range of downstream tasks. We propose Sci-CoT, a two-stage framework that separates the processes of generating rationales and inferring answers. Our 80-million parameter model is able to exceed the performance of BLOOM-176B in the ARC-Easy dataset under the few shot setting.
arXiv Detail & Related papers (2023-08-09T03:18:07Z)
Minimal Value-Equivalent Partial Models for Scalable and Robust Planning in Lifelong Reinforcement Learning [56.50123642237106]
Common practice in model-based reinforcement learning is to learn models that model every aspect of the agent's environment. We argue that such models are not particularly well-suited for performing scalable and robust planning in lifelong reinforcement learning scenarios. We propose new kinds of models that only model the relevant aspects of the environment, which we call "minimal value-minimal partial models"
arXiv Detail & Related papers (2023-01-24T16:40:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.