Composable Learning with Sparse Kernel Representations
- URL: http://arxiv.org/abs/2103.14474v2
- Date: Mon, 29 Mar 2021 16:14:00 GMT
- Title: Composable Learning with Sparse Kernel Representations
- Authors: Ekaterina Tolstaya, Ethan Stump, Alec Koppel, Alejandro Ribeiro
- Abstract summary: We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space.
We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function.
We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
- Score: 110.19179439773578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a reinforcement learning algorithm for learning sparse
non-parametric controllers in a Reproducing Kernel Hilbert Space. We improve
the sample complexity of this approach by imposing a structure of the
state-action function through a normalized advantage function (NAF). This
representation of the policy enables efficiently composing multiple learned
models without additional training samples or interaction with the environment.
We demonstrate the performance of this algorithm on learning obstacle-avoidance
policies in multiple simulations of a robot equipped with a laser scanner while
navigating in a 2D environment. We apply the composition operation to various
policy combinations and test them to show that the composed policies retain the
performance of their components. We also transfer the composed policy directly
to a physical platform operating in an arena with obstacles in order to
demonstrate a degree of generalization.
Related papers
- Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization [17.729842629392742]
We study a Reinforcement Learning problem in which we are given a set of trajectories collected with K baseline policies.
The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space.
arXiv Detail & Related papers (2024-03-28T14:34:02Z) - Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches.
We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment.
Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - AGPNet -- Autonomous Grading Policy Network [0.5232537118394002]
We formalize the problem as a Markov Decision Process and design a simulation which demonstrates agent-environment interactions.
We use methods from reinforcement learning, behavior cloning and contrastive learning to train a hybrid policy.
Our trained agent, AGPNet, reaches human-level performance and outperforms current state-of-the-art machine learning methods for the autonomous grading task.
arXiv Detail & Related papers (2021-12-20T21:44:21Z) - Learn Dynamic-Aware State Embedding for Transfer Learning [0.8756822885568589]
We consider the setting where all tasks (MDPs) share the same environment dynamic except reward function.
In this setting, the MDP dynamic is a good knowledge to transfer, which can be inferred by uniformly random policy.
We observe that the binary MDP dynamic can be inferred from trajectories of any policy which avoids the need of uniform random policy.
arXiv Detail & Related papers (2021-01-06T19:07:31Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z) - PFPN: Continuous Control of Physically Simulated Characters using
Particle Filtering Policy Network [0.9137554315375919]
We propose a framework that considers a particle-based action policy as a substitute for Gaussian policies.
We demonstrate the applicability of our approach on various motion capture imitation tasks.
arXiv Detail & Related papers (2020-03-16T00:35:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.