Related papers: Reasoning Distillation and Structural Alignment for Improved Code Generation

Reasoning Distillation and Structural Alignment for Improved Code Generation

URL: http://arxiv.org/abs/2510.17598v1
Date: Mon, 20 Oct 2025 14:47:47 GMT
Title: Reasoning Distillation and Structural Alignment for Improved Code Generation
Authors: Amir Jalilifard, Anderson de Rezende Rocha, Marcos Medeiros Raimundo,
Abstract summary: This work distills the reasoning capabilities of a large language model into a smaller, more efficient model that is faster and cheaper to deploy.<n>Our approach trains the model to emulate the reasoning and problem-solving abilities of the VLLM by learning to identify correct solution pathways.<n> Experimental results show that our fine-tuned model, developed through a cheap and simple to implement process, significantly outperforms our baseline model in terms of pass@1, average data flow, and average syntax match metrics.
Score: 0.6933020649563103
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Effective code generation with language models hinges on two critical factors: accurately understanding the intent of the prompt and generating code that applies algorithmic reasoning to produce correct solutions capable of passing diverse test cases while adhering to the syntax of the target programming language. Unlike other language tasks, code generation requires more than accurate token prediction; it demands comprehension of solution-level and structural relationships rather than merely generating the most likely tokens. very large language model (VLLM) are capable of generating detailed steps toward the correct solution of complex tasks where reasoning is crucial in solving the problem. Such reasoning capabilities may be absent in smaller language models. Therefore, in this work, we distill the reasoning capabilities of a VLLM into a smaller, more efficient model that is faster and cheaper to deploy. Our approach trains the model to emulate the reasoning and problem-solving abilities of the VLLM by learning to identify correct solution pathways and establishing a structural correspondence between problem definitions and potential solutions through a novel method of structure-aware loss optimization. This enables the model to transcend token-level generation and to deeply grasp the overarching structure of solutions for given problems. Experimental results show that our fine-tuned model, developed through a cheap and simple to implement process, significantly outperforms our baseline model in terms of pass@1, average data flow, and average syntax match metrics across the MBPP, MBPP Plus, and HumanEval benchmarks.

Related papers

The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs [45.08958917457921]
Large language models (LLMs) still struggle across tasks outside of high-resource languages.<n>In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce.
arXiv Detail & Related papers (2025-05-23T20:28:31Z)
From Reasoning to Code: GRPO Optimization for Underrepresented Languages [0.7864304771129751]
This paper introduces a generalizable approach that uses small-scale code versions of the Qwen 2.5 model combined with Group Relative Policy Optimization.<n>It produces logically consistent and syntactically accurate code by directly integrating reasoning-driven feedback into the reinforcement learning loop.
arXiv Detail & Related papers (2025-05-20T11:28:48Z)
Self-Steering Language Models [113.96916935955842]
DisCIPL is a method for "self-steering" language models (LMs)<n>DisCIPL generates a task-specific inference program that is executed by a population of Follower models.<n>Our work opens up a design space of highly-parallelized Monte Carlo inference strategies.
arXiv Detail & Related papers (2025-04-09T17:54:22Z)
Post-Incorporating Code Structural Knowledge into Pretrained Models via ICL for Code Translation [15.73070332657629]
Large language models (LLMs) have achieved significant advancements in software mining.<n> handling the syntactic structure of source code remains a challenge.<n>This paper employs incontext learning (ICL) to integrate code structural knowledge into pre-trained LLMs.
arXiv Detail & Related papers (2025-03-28T10:59:42Z)
Modularization is Better: Effective Code Generation with Modular Prompting [9.955541341324007]
We propose a novel prompting technique, called MoT, to enhance the code generation performance of Large Language Models.<n>MoT exploits modularization principles to decompose complex programming problems into smaller, independent reasoning steps.<n>It structures the reasoning process using an MLR Graph, which hierarchically organizes reasoning steps.
arXiv Detail & Related papers (2025-03-16T12:23:23Z)
Interactive and Expressive Code-Augmented Planning with Large Language Models [62.799579304821826]
Large Language Models (LLMs) demonstrate strong abilities in common-sense reasoning and interactive decision-making. Recent techniques have sought to structure LLM outputs using control flow and other code-adjacent techniques to improve planning performance. We propose REPL-Plan, an LLM planning approach that is fully code-expressive and dynamic.
arXiv Detail & Related papers (2024-11-21T04:23:17Z)
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [49.362750475706235]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.<n>We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.<n> Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning. LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors. We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes. CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks. It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z)
Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs [2.3020018305241337]
Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of large language models. We propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions. Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder.
arXiv Detail & Related papers (2024-04-11T22:19:50Z)
When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.