Related papers: Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training

Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training

URL: http://arxiv.org/abs/2506.18777v1
Date: Mon, 23 Jun 2025 15:45:44 GMT
Title: Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training
Authors: Jonathan Cook, Silvia Sapora, Arash Ahmadian, Akbir Khan, Tim Rocktaschel, Jakob Foerster, Laura Ruis,
Abstract summary: Training large language models (LLMs) on source code significantly enhances their general-purpose reasoning abilities.<n>We propose Programming by Backprop (PBB) as a potential driver of this effect.<n>We show that PBB leads to more robust evaluation of programs across inputs than training on I/O pairs drawn from a distribution that mirrors naturally occurring data.
Score: 2.743215038883958
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training large language models (LLMs) on source code significantly enhances their general-purpose reasoning abilities, but the mechanisms underlying this generalisation are poorly understood. In this paper, we propose Programming by Backprop (PBB) as a potential driver of this effect - teaching a model to evaluate a program for inputs by training on its source code alone, without ever seeing I/O examples. To explore this idea, we finetune LLMs on two sets of programs representing simple maths problems and algorithms: one with source code and I/O examples (w/ IO), the other with source code only (w/o IO). We find evidence that LLMs have some ability to evaluate w/o IO programs for inputs in a range of experimental settings, and make several observations. Firstly, PBB works significantly better when programs are provided as code rather than semantically equivalent language descriptions. Secondly, LLMs can produce outputs for w/o IO programs directly, by implicitly evaluating the program within the forward pass, and more reliably when stepping through the program in-context via chain-of-thought. We further show that PBB leads to more robust evaluation of programs across inputs than training on I/O pairs drawn from a distribution that mirrors naturally occurring data. Our findings suggest a mechanism for enhanced reasoning through code training: it allows LLMs to internalise reusable algorithmic abstractions. Significant scope remains for future work to enable LLMs to more effectively learn from symbolic procedures, and progress in this direction opens other avenues like model alignment by training on formal constitutional principles.

Related papers

On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding [5.429445008970627]
Large language models (LLMs) achieve remarkable performance in code generation tasks.<n>A significant performance disparity persists between popular programming languages.<n>We leverage the code translation task to train LLMs, thereby facilitating the transfer of coding proficiency.
arXiv Detail & Related papers (2025-05-19T05:25:29Z)
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors [5.247363735860479]
Large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks.<n>Given LLMs' ability to understand and process diverse programs, they present a promising direction for building general-purpose surrogate models.<n>We introduce SURGE, a benchmark with $1160$ problems covering $8$ key aspects.<n>Through empirical analysis of $21$ open-source and proprietary LLMs, we examine scaling laws, data efficiency, and predictive accuracy.
arXiv Detail & Related papers (2025-02-16T15:38:19Z)
An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models [6.976968804436321]
This paper proposes a novel approach to embedding source code by combining large language and sentence embedding models.<n>To evaluate the performance of our proposed approach, we conducted a series of experiments on three datasets with different programming languages.
arXiv Detail & Related papers (2024-09-23T01:03:15Z)
Case2Code: Scalable Synthetic Data for Code Generation [105.89741089673575]
Large Language Models (LLMs) have shown outstanding breakthroughs in code generation.<n>Recent work improves code LLMs by training on synthetic data generated by some powerful LLMs.<n>We propose a textbfCase2Code task by exploiting the expressiveness and correctness of programs.
arXiv Detail & Related papers (2024-07-17T11:35:00Z)
Large Language Models are Interpretable Learners [53.56735770834617]
In this paper, we show a combination of Large Language Models (LLMs) and symbolic programs can bridge the gap between expressiveness and interpretability. The pretrained LLM with natural language prompts provides a massive set of interpretable modules that can transform raw input into natural language concepts. As the knowledge learned by LSP is a combination of natural language descriptions and symbolic rules, it is easily transferable to humans (interpretable) and other LLMs.
arXiv Detail & Related papers (2024-06-25T02:18:15Z)
Reasoning Runtime Behavior of a Program with LLM: How Far Are We? [25.451857140926943]
Large language models for code (i.e., code LLMs) have shown strong code understanding and generation capabilities. Code reasoning is one of the most essential abilities of code LLMs. We propose a framework, namely REval, for evaluating code reasoning abilities and consistency of code LLMs with program execution.
arXiv Detail & Related papers (2024-03-25T05:37:16Z)
Efficient Tool Use with Chain-of-Abstraction Reasoning [63.08202389132155]
Large language models (LLMs) need to ground their reasoning to real-world knowledge.<n>There remains challenges for fine-tuning LLM agents to invoke tools in multi-step reasoning problems.<n>We propose a new method for LLMs to better leverage tools in multi-step reasoning.
arXiv Detail & Related papers (2024-01-30T21:53:30Z)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code) Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes. We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z)
Coarse-Tuning Models of Code with Reinforcement Learning Feedback [0.0]
Large Language Models (LLMs) pre-trained on code have emerged as the dominant approach to program synthesis. We propose RLCF, that further trains a pre-trained LLM via reinforcement learning, using feedback from a grounding function that scores the quality of the code.
arXiv Detail & Related papers (2023-05-25T22:09:08Z)
Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning [34.006227676170504]
This study investigates the feasibility of utilizing large language models (LLMs) to generate comments that can fulfill developers' diverse intents. Experiments on two large-scale datasets demonstrate the rationale of our insights.
arXiv Detail & Related papers (2023-04-22T12:26:24Z)
LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results. Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results. LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.