Learning of Generalizable and Interpretable Knowledge in Grid-Based
Reinforcement Learning Environments
- URL: http://arxiv.org/abs/2309.03651v1
- Date: Thu, 7 Sep 2023 11:46:57 GMT
- Title: Learning of Generalizable and Interpretable Knowledge in Grid-Based
Reinforcement Learning Environments
- Authors: Manuel Eberhardinger, Johannes Maucher, Setareh Maghsudi
- Abstract summary: We propose using program synthesis to imitate reinforcement learning policies.
We adapt the state-of-the-art program synthesis system DreamCoder for learning concepts in grid-based environments.
- Score: 5.217870815854702
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding the interactions of agents trained with deep reinforcement
learning is crucial for deploying agents in games or the real world. In the
former, unreasonable actions confuse players. In the latter, that effect is
even more significant, as unexpected behavior cause accidents with potentially
grave and long-lasting consequences for the involved individuals. In this work,
we propose using program synthesis to imitate reinforcement learning policies
after seeing a trajectory of the action sequence. Programs have the advantage
that they are inherently interpretable and verifiable for correctness. We adapt
the state-of-the-art program synthesis system DreamCoder for learning concepts
in grid-based environments, specifically, a navigation task and two miniature
versions of Atari games, Space Invaders and Asterix. By inspecting the
generated libraries, we can make inferences about the concepts the black-box
agent has learned and better understand the agent's behavior. We achieve the
same by visualizing the agent's decision-making process for the imitated
sequences. We evaluate our approach with different types of program
synthesizers based on a search-only method, a neural-guided search, and a
language model fine-tuned on code.
Related papers
- Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents [58.807802111818994]
We propose AnySkill, a novel hierarchical method that learns physically plausible interactions following open-vocabulary instructions.
Our approach begins by developing a set of atomic actions via a low-level controller trained via imitation learning.
An important feature of our method is the use of image-based rewards for the high-level policy, which allows the agent to learn interactions with objects without manual reward engineering.
arXiv Detail & Related papers (2024-03-19T15:41:39Z) - Octopus: Embodied Vision-Language Programmer from Environmental Feedback [58.04529328728999]
Embodied vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning.
To bridge this gap, we introduce Octopus, an embodied vision-language programmer that uses executable code generation as a medium to connect planning and manipulation.
Octopus is designed to 1) proficiently comprehend an agent's visual and textual task objectives, 2) formulate intricate action sequences, and 3) generate executable code.
arXiv Detail & Related papers (2023-10-12T17:59:58Z) - Action-Conditioned Contrastive Policy Pretraining [39.13710045468429]
Deep visuomotor policy learning achieves promising results in control tasks such as robotic manipulation and autonomous driving.
It requires a huge number of online interactions with the training environment, which limits its real-world application.
In this work, we aim to pretrain policy representations for driving tasks using hours-long uncurated YouTube videos.
arXiv Detail & Related papers (2022-04-05T17:58:22Z) - A Survey on Reinforcement Learning Methods in Character Animation [22.3342752080749]
Reinforcement Learning is an area of Machine Learning focused on how agents can be trained to make sequential decisions.
This paper surveys the modern Deep Reinforcement Learning methods and discusses their possible applications in Character Animation.
arXiv Detail & Related papers (2022-03-07T23:39:00Z) - Backprop-Free Reinforcement Learning with Active Neural Generative
Coding [84.11376568625353]
We propose a computational framework for learning action-driven generative models without backpropagation of errors (backprop) in dynamic environments.
We develop an intelligent agent that operates even with sparse rewards, drawing inspiration from the cognitive theory of planning as inference.
The robust performance of our agent offers promising evidence that a backprop-free approach for neural inference and learning can drive goal-directed behavior.
arXiv Detail & Related papers (2021-07-10T19:02:27Z) - Agents that Listen: High-Throughput Reinforcement Learning with Multiple
Sensory Systems [6.952659395337689]
We introduce a new version of VizDoom simulator to create a highly efficient learning environment that provides raw audio observations.
We train our agent to play the full game of Doom and find that it can consistently defeat a traditional vision-based adversary.
arXiv Detail & Related papers (2021-07-05T18:00:50Z) - DeepSym: Deep Symbol Generation and Rule Learning from Unsupervised
Continuous Robot Interaction for Planning [1.3854111346209868]
A robot arm-hand system learns symbols that can be interpreted as 'rollable', 'insertable', 'larger-than' from its push and stack actions.
Our system is verified in a physics-based 3d simulation environment where a robot arm-hand system learned symbols that can be interpreted as 'rollable', 'insertable', 'larger-than' from its push and stack actions.
arXiv Detail & Related papers (2020-12-04T11:26:06Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Learning to Simulate Dynamic Environments with GameGAN [109.25308647431952]
In this paper, we aim to learn a simulator by simply watching an agent interact with an environment.
We introduce GameGAN, a generative model that learns to visually imitate a desired game by ingesting screenplay and keyboard actions during training.
arXiv Detail & Related papers (2020-05-25T14:10:17Z) - Meta-learning curiosity algorithms [26.186627089223624]
We formulate the problem of generating curious behavior as one of meta-learning.
Our rich language of programs combines neural networks with other building blocks such as buffers, nearest-neighbor modules and custom loss functions.
We find two novel curiosity algorithms that perform on par or better than human-designed published curiosity algorithms in domains as disparate as grid navigation with image inputs, acrobot, lunar lander, ant and hopper.
arXiv Detail & Related papers (2020-03-11T14:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.