Related papers: Can Large Language Models Invent Algorithms to Improve Themselves?: Algorithm Discovery for Recursive Self-Improvement through Reinforcement Learning

Can Large Language Models Invent Algorithms to Improve Themselves?: Algorithm Discovery for Recursive Self-Improvement through Reinforcement Learning

URL: http://arxiv.org/abs/2410.15639v5
Date: Tue, 10 Jun 2025 08:35:14 GMT
Title: Can Large Language Models Invent Algorithms to Improve Themselves?: Algorithm Discovery for Recursive Self-Improvement through Reinforcement Learning
Authors: Yoichi Ishibashi, Taro Yano, Masafumi Oyamada,
Abstract summary: Self-Developing is a framework that enables Large Language Models to autonomously discover, implement, and refine their own improvement algorithms.<n>We demonstrate this framework through model merging, a practical technique for combining specialized models.<n>On mathematical reasoning benchmarks, the autonomously discovered algorithms improve the seed model's GSM8k performance by 6% and exceed human-designed approaches like Task Arithmetic by 4.3%.
Score: 3.6117068575553595
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have achieved remarkable capabilities, yet their improvement methods remain fundamentally constrained by human design. We present Self-Developing, a framework that enables LLMs to autonomously discover, implement, and refine their own improvement algorithms. Our approach employs an iterative cycle where a seed model generates algorithmic candidates as executable code, evaluates their effectiveness, and uses Direct Preference Optimization to recursively improve increasingly sophisticated improvement strategies. We demonstrate this framework through model merging, a practical technique for combining specialized models. Self-Developing successfully discovered novel merging algorithms that outperform existing human-designed algorithms. On mathematical reasoning benchmarks, the autonomously discovered algorithms improve the seed model's GSM8k performance by 6\% and exceed human-designed approaches like Task Arithmetic by 4.3\%. Remarkably, these algorithms exhibit strong generalization, achieving 7.4\% gains on out-of-domain models without re-optimization. Our findings demonstrate that LLMs can transcend their training to invent genuinely novel optimization techniques. This capability represents a crucial step toward a new era where LLMs not only solve problems but autonomously develop the methodologies for their own advancement.

Related papers

Evolution of Optimization Algorithms for Global Placement via Large Language Models [18.373855320220887]
This paper presents an automated framework to evolve optimization algorithms for global placement.<n>We first generate diverse candidate algorithms using large language models (LLM) through carefully crafted prompts.<n>The discovered optimization algorithms exhibit substantial performance improvements across many benchmarks.
arXiv Detail & Related papers (2025-04-18T09:57:14Z)
Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning [12.037588566211348]
We propose to refine large language models (LLMs) through reinforcement learning (RL) fine-tuning. Our experiments show that combining RL and evolutionary search improves efficiency of improved algorithms.
arXiv Detail & Related papers (2025-04-07T14:14:15Z)
LLM-Guided Evolution: An Autonomous Model Optimization for Object Detection [0.0]
In machine learning, Neural Architecture Search (NAS) requires domain knowledge of model design and a large amount of trial-and-error to achieve promising performance. The Large Language Model (LLM)-Guided Evolution (GE) framework transformed this approach by incorporating LLMs to directly modify model source code for image classification algorithms on CIFAR data. We show that LLM-GE produced variants with significant performance improvements, such as an increase in Mean Average Precision from 92.5% to 94.5%.
arXiv Detail & Related papers (2025-04-03T05:06:06Z)
RL-finetuning LLMs from on- and off-policy data with a single algorithm [53.70731390624718]
We introduce a novel reinforcement learning algorithm (AGRO) for fine-tuning large-language models. AGRO leverages the concept of generation consistency, which states that the optimal policy satisfies the notion of consistency across any possible generation of the model. We derive algorithms that find optimal solutions via the sample-based policy gradient and provide theoretical guarantees on their convergence.
arXiv Detail & Related papers (2025-03-25T12:52:38Z)
Combinatorial Optimization for All: Using LLMs to Aid Non-Experts in Improving Optimization Algorithms [0.9668407688201361]
Large Language Models (LLMs) have shown notable potential in code generation for optimization algorithms.<n>This paper examines how LLMs, rather than creating algorithms from scratch, can improve existing ones without the need for specialized expertise.
arXiv Detail & Related papers (2025-03-14T00:26:00Z)
From Understanding to Excelling: Template-Free Algorithm Design through Structural-Functional Co-Evolution [39.42526347710991]
Large language models (LLMs) have greatly accelerated the automation of algorithm generation and optimization.<n>We introduce an end-to-end algorithm generation and optimization framework based on LLMs.<n>Our approach utilizes the deep semantic understanding of LLMs to convert natural language requirements or human-authored papers into code solutions.
arXiv Detail & Related papers (2025-03-13T08:26:18Z)
MLGym: A New Framework and Benchmark for Advancing AI Research Agents [51.9387884953294]
We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing large language models on AI research tasks. This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement learning (RL) algorithms for training such agents. We evaluate a number of frontier large language models (LLMs) on our benchmarks such as Claude-3.5-Sonnet, Llama-3.1 405B, GPT-4o, o1-preview, and Gemini-1.5 Pro.
arXiv Detail & Related papers (2025-02-20T12:28:23Z)
EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z)
On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks. We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly. In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z)
On the Design and Analysis of LLM-Based Algorithms [74.7126776018275]
Large language models (LLMs) are used as sub-routines in algorithms. LLMs have achieved remarkable empirical success. Our proposed framework holds promise for advancing LLM-based algorithms.
arXiv Detail & Related papers (2024-07-20T07:39:07Z)
Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary Study [5.6787965501364335]
Surrogate-assisted selection is a core step in evolutionary algorithms to solve expensive optimization problems. Traditionally, this has relied on conventional machine learning methods, leveraging historical evaluated evaluations to predict the performance of new solutions. In this work, we propose a novel surrogate model based purely on LLM inference capabilities, eliminating the need for training.
arXiv Detail & Related papers (2024-06-15T15:54:00Z)
Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention. Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z)
LLaMEA: A Large Language Model Evolutionary Algorithm for Automatically Generating Metaheuristics [0.023020018305241332]
This paper introduces a novel Large Language Model Evolutionary Algorithm (LLaMEA) framework. Given a set of criteria and a task definition (the search space), LLaMEA iteratively generates, mutates and selects algorithms. We show how this framework can be used to generate novel black-box metaheuristic optimization algorithms automatically.
arXiv Detail & Related papers (2024-05-30T15:10:59Z)
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process. We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z)
Designing Network Algorithms via Large Language Models [11.055072300500104]
We introduce NADA, the first framework to autonomously design network algorithms by leveraging the generative capabilities of large language models (LLMs) We demonstrate that NADA produces novel ABR algorithms that consistently outperform the original algorithm in diverse network environments, including broadband, satellite, 4G, and 5G.
arXiv Detail & Related papers (2024-04-02T03:43:55Z)
Evolutionary Optimization of Model Merging Recipes [21.41838972039297]
We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. We propose an evolutionary approach that overcomes the limitation by automatically discovering effective combinations of diverse open-source models. This work contributes new state-of-the-art models back to the open-source community, and also introduces a new paradigm for automated model composition.
arXiv Detail & Related papers (2024-03-19T22:56:53Z)
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts.<n>We identify two pivotal factors in model parameter learning: update direction and update method.<n>We develop a capable Gradient-inspired Prompt-based GPO.
arXiv Detail & Related papers (2024-02-27T15:05:32Z)
Algorithm Evolution Using Large Language Model [18.03090066194074]
We propose a novel approach called Evolution Algorithm using Large Language Model (AEL) AEL does algorithm-level evolution without model training. Human effort and requirements for domain knowledge can be significantly reduced.
arXiv Detail & Related papers (2023-11-26T09:38:44Z)
STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments. We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities. Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Augmenting Interpretable Models with LLMs during Training [73.40079895413861]
We propose Augmented Interpretable Models (Aug-imodels) to build efficient and interpretable models. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency. We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions.
arXiv Detail & Related papers (2022-09-23T18:36:01Z)
Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z)
Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms. Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.