Synthesizing Interpretable Control Policies through Large Language Model Guided Search
- URL: http://arxiv.org/abs/2410.05406v1
- Date: Mon, 7 Oct 2024 18:12:20 GMT
- Title: Synthesizing Interpretable Control Policies through Large Language Model Guided Search
- Authors: Carlo Bosio, Mark W. Mueller,
- Abstract summary: We represent control policies as programs in standard languages like Python.
We evaluate candidate controllers in simulation and evolve them using a pre-trained LLM.
We illustrate our method through its application to the synthesis of an interpretable control policy for the pendulum swing-up and the ball in cup tasks.
- Score: 7.706225175516503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The combination of Large Language Models (LLMs), systematic evaluation, and evolutionary algorithms has enabled breakthroughs in combinatorial optimization and scientific discovery. We propose to extend this powerful combination to the control of dynamical systems, generating interpretable control policies capable of complex behaviors. With our novel method, we represent control policies as programs in standard languages like Python. We evaluate candidate controllers in simulation and evolve them using a pre-trained LLM. Unlike conventional learning-based control techniques, which rely on black box neural networks to encode control policies, our approach enhances transparency and interpretability. We still take advantage of the power of large AI models, but leverage it at the policy design phase, ensuring that all system components remain interpretable and easily verifiable at runtime. Additionally, the use of standard programming languages makes it straightforward for humans to finetune or adapt the controllers based on their expertise and intuition. We illustrate our method through its application to the synthesis of an interpretable control policy for the pendulum swing-up and the ball in cup tasks. We make the code available at https://github.com/muellerlab/synthesizing_interpretable_control_policies.git
Related papers
- Aligning Large Language Models with Representation Editing: A Control Perspective [38.71496554018039]
Fine-tuning large language models (LLMs) to align with human objectives is crucial for real-world applications.
Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model.
We propose aligning LLMs through representation editing.
arXiv Detail & Related papers (2024-06-10T01:21:31Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - Dimensionless Policies based on the Buckingham $\pi$ Theorem: Is This a
Good Way to Generalize Numerical Results? [66.52698983694613]
This article explores the use of the Buckingham $pi$ theorem as a tool to encode the control policies of physical systems into a generic form of knowledge.
We show, by restating the solution to a motion control problem using dimensionless variables, that (1) the policy mapping involves a reduced number of parameters and (2) control policies generated numerically for a specific system can be transferred exactly to a subset of dimensionally similar systems by scaling the input and output variables appropriately.
It remains to be seen how practical this approach can be to generalize policies for more complex high-dimensional problems, but the early results show that it is a
arXiv Detail & Related papers (2023-07-29T00:51:26Z) - Large Language Models as General Pattern Machines [64.75501424160748]
We show that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences.
Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary.
In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics.
arXiv Detail & Related papers (2023-07-10T17:32:13Z) - Using Simulation Optimization to Improve Zero-shot Policy Transfer of
Quadrotors [0.14999444543328289]
We show that it is possible to train low-level control policies with reinforcement learning entirely in simulation and deploy them on a quadrotor robot without using real-world data to fine-tune.
Our neural network-based policies use only onboard sensor data and run entirely on the embedded drone hardware.
arXiv Detail & Related papers (2022-01-04T22:32:05Z) - Policy Search for Model Predictive Control with Application to Agile
Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC.
Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies.
Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z) - Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space.
We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function.
We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z) - AirCapRL: Autonomous Aerial Human Motion Capture using Deep
Reinforcement Learning [38.429105809093116]
We introduce a deep reinforcement learning (RL) based multi-robot formation controller for the task of autonomous aerial human motion capture (MoCap)
We focus on vision-based MoCap, where the objective is to estimate the trajectory of body pose and shape a single moving person using multiple aerial vehicles.
arXiv Detail & Related papers (2020-07-13T12:30:31Z) - PFPN: Continuous Control of Physically Simulated Characters using
Particle Filtering Policy Network [0.9137554315375919]
We propose a framework that considers a particle-based action policy as a substitute for Gaussian policies.
We demonstrate the applicability of our approach on various motion capture imitation tasks.
arXiv Detail & Related papers (2020-03-16T00:35:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.