Related papers: LLM-ERM: Sample-Efficient Program Learning via LLM-Guided Search

LLM-ERM: Sample-Efficient Program Learning via LLM-Guided Search

URL: http://arxiv.org/abs/2510.14331v1
Date: Thu, 16 Oct 2025 06:10:11 GMT
Title: LLM-ERM: Sample-Efficient Program Learning via LLM-Guided Search
Authors: Shivam Singhal, Eran Malach, Tomaso Poggio, Tomer Galanti,
Abstract summary: LLM-ERM is a propose-and-verify framework that replaces exhaustive enumeration with an LLM-guided search over candidate programs.<n>We show that coordinate-wise online mini-batch SGD requires many samples to learn certain short programs.<n>These results indicate that language-guided program synthesis recovers much of the statistical efficiency of finite-class ERM.
Score: 23.97383442759484
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We seek algorithms for program learning that are both sample-efficient and computationally feasible. Classical results show that targets admitting short program descriptions (e.g., with short ``python code'') can be learned with a ``small'' number of examples (scaling with the size of the code) via length-first program enumeration, but the search is exponential in description length. Consequently, Gradient-based training avoids this cost yet can require exponentially many samples on certain short-program families. To address this gap, we introduce LLM-ERM, a propose-and-verify framework that replaces exhaustive enumeration with an LLM-guided search over candidate programs while retaining ERM-style selection on held-out data. Specifically, we draw $k$ candidates with a pretrained reasoning-augmented LLM, compile and check each on the data, and return the best verified hypothesis, with no feedback, adaptivity, or gradients. Theoretically, we show that coordinate-wise online mini-batch SGD requires many samples to learn certain short programs. {\em Empirically, LLM-ERM solves tasks such as parity variants, pattern matching, and primality testing with as few as 200 samples, while SGD-trained transformers overfit even with 100,000 samples}. These results indicate that language-guided program synthesis recovers much of the statistical efficiency of finite-class ERM while remaining computationally tractable, offering a practical route to learning succinct hypotheses beyond the reach of gradient-based training.

Related papers

On Selecting Few-Shot Examples for LLM-based Code Vulnerability Detection [8.460805514983816]
Large language models (LLMs) have demonstrated impressive capabilities for many coding tasks.<n> detecting code vulnerabilities remains a challenging task for LLMs.<n>In-context learning (ICL) provides few-shot examples similar to the query, along with correct answers.
arXiv Detail & Related papers (2025-10-31T17:41:58Z)
Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? [65.18157595903124]
This work investigates iterative approximate evaluation for arbitrary prompts.<n>It introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework.<n>MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced rollouts.
arXiv Detail & Related papers (2025-07-07T03:20:52Z)
Generating Diverse Training Samples for Relation Extraction with Large Language Models [30.196619805354622]
We study how to effectively improve the diversity of the training samples generated with Large Language Models (LLMs) for Relation Extraction (RE)<n>Our experiments on commonly used RE datasets show that both attempts can improve the quality of the generated training data.
arXiv Detail & Related papers (2025-05-29T05:21:54Z)
Towards Effectively Leveraging Execution Traces for Program Repair with Code LLMs [13.708569727719434]
Large Language Models (LLMs) show promising performance on various programming tasks.<n>We aim to remedy this potential blind spot by augmenting standard APR prompts with program execution traces.
arXiv Detail & Related papers (2025-05-07T14:12:41Z)
Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals. We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs. Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z)
AIME: AI System Optimization via Multiple LLM Evaluators [79.03422337674664]
AIME is an evaluation protocol that utilizes multiple LLMs that each independently generate an evaluation on separate criteria and then combine them via concatenation. We show AIME outperforming baseline methods in code generation tasks, with up to $62%$ higher error detection rate and up to $16%$ higher success rate than a single LLM evaluation protocol on LeetCodeHard and HumanEval datasets.
arXiv Detail & Related papers (2024-10-04T04:03:24Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
ITERTL: An Iterative Framework for Fine-tuning LLMs for RTL Code Generation [9.409062607311528]
Large language models (LLMs) have demonstrated excellent performance, inspiring researchers to explore their use in automating register transfer level (RTL) code generation.<n>Existing approaches to fine-tune LLMs for RTL generation typically are conducted on fixed datasets.<n>We introduce an iterative training paradigm named ITERTL to mitigate these issues.<n>Our model outperforms GPT4 and state-of-the-art (SOTA) open-source models, achieving remarkable 53.8% pass@1 rate on VerilogEval-human benchmark.
arXiv Detail & Related papers (2024-06-28T01:44:57Z)
Experimental Design for Active Transductive Inference in Large Language Models [18.2671641610825]
We use active learning for adaptive prompt design and call it Active In-context Prompt Design (AIPD) We design the LLM prompt by adaptively choosing few-shot examples from a training set to optimize performance on a test set. We propose two algorithms, GO and SAL, which differ in how the few-shot examples are chosen.
arXiv Detail & Related papers (2024-04-12T23:27:46Z)
Large Language Model-Aware In-Context Learning for Code Generation [75.68709482932903]
Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation. We propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation.
arXiv Detail & Related papers (2023-10-15T06:12:58Z)
Sample Efficient Linear Meta-Learning by Alternating Minimization [74.40553081646995]
We study a simple alternating minimization method (MLLAM) which alternately learns the low-dimensional subspace and the regressors. We show that for a constant subspace dimension MLLAM obtains nearly-optimal estimation error, despite requiring only $Omega(log d)$ samples per task. We propose a novel task subset selection scheme that ensures the same strong statistical guarantee as MLLAM.
arXiv Detail & Related papers (2021-05-18T06:46:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.