Related papers: Synthesizing Interpretable Control Policies through Large Language Model Guided Search

Synthesizing Interpretable Control Policies through Large Language Model Guided Search

URL: http://arxiv.org/abs/2410.05406v1
Date: Mon, 7 Oct 2024 18:12:20 GMT
Title: Synthesizing Interpretable Control Policies through Large Language Model Guided Search
Authors: Carlo Bosio, Mark W. Mueller,
Abstract summary: We represent control policies as programs in standard languages like Python. We evaluate candidate controllers in simulation and evolve them using a pre-trained LLM. We illustrate our method through its application to the synthesis of an interpretable control policy for the pendulum swing-up and the ball in cup tasks.
Score: 7.706225175516503
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The combination of Large Language Models (LLMs), systematic evaluation, and evolutionary algorithms has enabled breakthroughs in combinatorial optimization and scientific discovery. We propose to extend this powerful combination to the control of dynamical systems, generating interpretable control policies capable of complex behaviors. With our novel method, we represent control policies as programs in standard languages like Python. We evaluate candidate controllers in simulation and evolve them using a pre-trained LLM. Unlike conventional learning-based control techniques, which rely on black box neural networks to encode control policies, our approach enhances transparency and interpretability. We still take advantage of the power of large AI models, but leverage it at the policy design phase, ensuring that all system components remain interpretable and easily verifiable at runtime. Additionally, the use of standard programming languages makes it straightforward for humans to finetune or adapt the controllers based on their expertise and intuition. We illustrate our method through its application to the synthesis of an interpretable control policy for the pendulum swing-up and the ball in cup tasks. We make the code available at https://github.com/muellerlab/synthesizing_interpretable_control_policies.git

Related papers

Aligning Large Language Models with Representation Editing: A Control Perspective [38.71496554018039]
Fine-tuning large language models (LLMs) to align with human objectives is crucial for real-world applications. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model. We propose aligning LLMs through representation editing.
arXiv Detail & Related papers (2024-06-10T01:21:31Z)
Policy Learning with a Language Bottleneck [65.99843627646018]
We introduce Policy Learning with a Language Bottleneck (PLLB), a framework enabling AI agents to generate linguistic rules. PLLBB alternates between a *rule generation* step guided by language models, and an *update* step where agents learn new policies guided by rules. We show thatPLLB agents are able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users.
arXiv Detail & Related papers (2024-05-07T08:40:21Z)
Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms. We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z)
Dimensionless Policies based on the Buckingham $\pi$ Theorem: Is This a Good Way to Generalize Numerical Results? [66.52698983694613]
This article explores the use of the Buckingham $pi$ theorem as a tool to encode the control policies of physical systems into a generic form of knowledge. We show, by restating the solution to a motion control problem using dimensionless variables, that (1) the policy mapping involves a reduced number of parameters and (2) control policies generated numerically for a specific system can be transferred exactly to a subset of dimensionally similar systems by scaling the input and output variables appropriately. It remains to be seen how practical this approach can be to generalize policies for more complex high-dimensional problems, but the early results show that it is a
arXiv Detail & Related papers (2023-07-29T00:51:26Z)
Large Language Models as General Pattern Machines [64.75501424160748]
We show that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences. Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary. In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics.
arXiv Detail & Related papers (2023-07-10T17:32:13Z)
ControlVAE: Model-Based Learning of Generative Controllers for Physics-Based Characters [28.446959320429656]
We introduce ControlVAE, a model-based framework for learning generative motion control policies based on variational autoencoders (VAE) Our framework can learn a rich and flexible latent representation of skills and a skill-conditioned generative control policy from a diverse set of unorganized motion sequences. We demonstrate the effectiveness of ControlVAE using a diverse set of tasks, which allows realistic and interactive control of the simulated characters.
arXiv Detail & Related papers (2022-10-12T10:11:36Z)
Human-AI Shared Control via Frequency-based Policy Dissection [34.0399894373716]
Human-AI shared control allows human to interact and collaborate with AI to accomplish control tasks in complex environments. Previous Reinforcement Learning (RL) methods attempt the goal-conditioned design to achieve human-controllable policies. We develop a simple yet effective frequency-based approach called textitPolicy Dissection to align the intermediate representation of the learned neural controller with the kinematic attributes of the agent behavior.
arXiv Detail & Related papers (2022-05-31T23:57:55Z)
Using Simulation Optimization to Improve Zero-shot Policy Transfer of Quadrotors [0.14999444543328289]
We show that it is possible to train low-level control policies with reinforcement learning entirely in simulation and deploy them on a quadrotor robot without using real-world data to fine-tune. Our neural network-based policies use only onboard sensor data and run entirely on the embedded drone hardware.
arXiv Detail & Related papers (2022-01-04T22:32:05Z)
Policy Search for Model Predictive Control with Application to Agile Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC. Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies. Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z)
Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space. We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function. We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z)
Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space. NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z)
AirCapRL: Autonomous Aerial Human Motion Capture using Deep Reinforcement Learning [38.429105809093116]
We introduce a deep reinforcement learning (RL) based multi-robot formation controller for the task of autonomous aerial human motion capture (MoCap) We focus on vision-based MoCap, where the objective is to estimate the trajectory of body pose and shape a single moving person using multiple aerial vehicles.
arXiv Detail & Related papers (2020-07-13T12:30:31Z)
PFPN: Continuous Control of Physically Simulated Characters using Particle Filtering Policy Network [0.9137554315375919]
We propose a framework that considers a particle-based action policy as a substitute for Gaussian policies. We demonstrate the applicability of our approach on various motion capture imitation tasks.
arXiv Detail & Related papers (2020-03-16T00:35:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.