Code Evolution for Control: Synthesizing Policies via LLM-Driven Evolutionary Search
- URL: http://arxiv.org/abs/2601.06845v1
- Date: Sun, 11 Jan 2026 10:21:22 GMT
- Title: Code Evolution for Control: Synthesizing Policies via LLM-Driven Evolutionary Search
- Authors: Ping Guo, Chao Li, Yinglan Feng, Chaoning Zhang,
- Abstract summary: We show that evolutionary search can effectively synthesize interpretable control policies in the form of executable code.<n>We implement our approach using EvoToolkit, a framework that seamlessly integrates evolutionary computation with customizable fitness evaluation.<n>This work highlights the potential of combining foundation models with evolutionary computation for trustworthy control policies in autonomous systems.
- Score: 15.216159860533397
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Designing effective control policies for autonomous systems remains a fundamental challenge, traditionally addressed through reinforcement learning or manual engineering. While reinforcement learning has achieved remarkable success, it often suffers from high sample complexity, reward shaping difficulties, and produces opaque neural network policies that are hard to interpret or verify. Manual design, on the other hand, requires substantial domain expertise and struggles to scale across diverse tasks. In this work, we demonstrate that LLM-driven evolutionary search can effectively synthesize interpretable control policies in the form of executable code. By treating policy synthesis as a code evolution problem, we harness the LLM's prior knowledge of programming patterns and control heuristics while employing evolutionary search to explore the solution space systematically. We implement our approach using EvoToolkit, a framework that seamlessly integrates LLM-driven evolution with customizable fitness evaluation. Our method iteratively evolves populations of candidate policy programs, evaluating them against task-specific objectives and selecting superior individuals for reproduction. This process yields compact, human-readable control policies that can be directly inspected, modified, and formally verified. This work highlights the potential of combining foundation models with evolutionary computation for synthesizing trustworthy control policies in autonomous systems. Code is available at https://github.com/pgg3/EvoControl.
Related papers
- PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution [64.15555230987222]
PACEvolve is a framework designed to robustly govern the agent's context and search dynamics.<n>We demonstrate that PACEvolve provides a systematic path to consistent, long-horizon self-improvement.
arXiv Detail & Related papers (2026-01-15T18:25:23Z) - Policy-Conditioned Policies for Multi-Agent Task Solving [53.67744322553693]
In this work, we propose a paradigm shift that bridges the gap by representing policies as human-interpretable source code.<n>We reformulate the learning problem by utilizing Large Language Models (LLMs) as approximate interpreters.<n>We formalize this process as textitProgrammatic Iterated Best Response (PIBR), an algorithm where the policy code is optimized by textual gradients.
arXiv Detail & Related papers (2025-12-24T07:42:10Z) - EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning [63.03672166010434]
We introduce an evolutionary, task-agnostic, strategy-guided, executably-checkable data synthesis framework.<n>It jointly synthesizes problems, diverse candidate solutions, and verification artifacts.<n>It iteratively discovers strategies via a consistency-based evaluator that enforces agreement between human-annotated and strategy-induced checks.
arXiv Detail & Related papers (2025-10-20T11:56:35Z) - Multimodal LLM-assisted Evolutionary Search for Programmatic Control Policies [36.44665658496622]
This work introduces a novel approach for programmatic control policy discovery, called Multimodal Large Language Model-assisted Evolutionary Search (MLES)<n>MLES utilizes multimodal large language models as programmatic policy generators, combining them with evolutionary search to automate policy generation.<n> Experimental results demonstrate that MLES achieves performance comparable to Proximal Policy Optimization (PPO) across two standard control tasks.
arXiv Detail & Related papers (2025-08-07T14:24:03Z) - Synthesizing Interpretable Control Policies through Large Language Model Guided Search [7.706225175516503]
We represent control policies as programs in standard languages like Python.<n>We evaluate candidate controllers in simulation and evolve them using a pre-trained LLM.<n>We illustrate our method through its application to the synthesis of an interpretable control policy for the pendulum swing-up and the ball in cup tasks.
arXiv Detail & Related papers (2024-10-07T18:12:20Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Octopus: Embodied Vision-Language Programmer from Environmental Feedback [58.04529328728999]
Embodied vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning.
To bridge this gap, we introduce Octopus, an embodied vision-language programmer that uses executable code generation as a medium to connect planning and manipulation.
Octopus is designed to 1) proficiently comprehend an agent's visual and textual task objectives, 2) formulate intricate action sequences, and 3) generate executable code.
arXiv Detail & Related papers (2023-10-12T17:59:58Z) - ControlVAE: Model-Based Learning of Generative Controllers for
Physics-Based Characters [28.446959320429656]
We introduce ControlVAE, a model-based framework for learning generative motion control policies based on variational autoencoders (VAE)
Our framework can learn a rich and flexible latent representation of skills and a skill-conditioned generative control policy from a diverse set of unorganized motion sequences.
We demonstrate the effectiveness of ControlVAE using a diverse set of tasks, which allows realistic and interactive control of the simulated characters.
arXiv Detail & Related papers (2022-10-12T10:11:36Z) - Collision-Free Flocking with a Dynamic Squad of Fixed-Wing UAVs Using
Deep Reinforcement Learning [2.555094847583209]
We deal with the decentralized leader-follower flocking control problem through deep reinforcement learning (DRL)
We propose a novel reinforcement learning algorithm CACER-II for training a shared control policy for all the followers.
As a result, the variable-length system state can be encoded into a fixed-length embedding vector, which makes the learned DRL policies independent with the number or the order of followers.
arXiv Detail & Related papers (2021-01-20T11:23:35Z) - AirCapRL: Autonomous Aerial Human Motion Capture using Deep
Reinforcement Learning [38.429105809093116]
We introduce a deep reinforcement learning (RL) based multi-robot formation controller for the task of autonomous aerial human motion capture (MoCap)
We focus on vision-based MoCap, where the objective is to estimate the trajectory of body pose and shape a single moving person using multiple aerial vehicles.
arXiv Detail & Related papers (2020-07-13T12:30:31Z) - Evolutionary Stochastic Policy Distillation [139.54121001226451]
We propose a new method called Evolutionary Policy Distillation (ESPD) to solve GCRS tasks.
ESPD enables a target policy to learn from a series of its variants through the technique of policy distillation (PD)
The experiments based on the MuJoCo control suite show the high learning efficiency of the proposed method.
arXiv Detail & Related papers (2020-04-27T16:19:25Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.