Composable Learning with Sparse Kernel Representations
- URL: http://arxiv.org/abs/2103.14474v2
- Date: Mon, 29 Mar 2021 16:14:00 GMT
- Title: Composable Learning with Sparse Kernel Representations
- Authors: Ekaterina Tolstaya, Ethan Stump, Alec Koppel, Alejandro Ribeiro
- Abstract summary: We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space.
We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function.
We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
- Score: 110.19179439773578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a reinforcement learning algorithm for learning sparse
non-parametric controllers in a Reproducing Kernel Hilbert Space. We improve
the sample complexity of this approach by imposing a structure of the
state-action function through a normalized advantage function (NAF). This
representation of the policy enables efficiently composing multiple learned
models without additional training samples or interaction with the environment.
We demonstrate the performance of this algorithm on learning obstacle-avoidance
policies in multiple simulations of a robot equipped with a laser scanner while
navigating in a 2D environment. We apply the composition operation to various
policy combinations and test them to show that the composed policies retain the
performance of their components. We also transfer the composed policy directly
to a physical platform operating in an arena with obstacles in order to
demonstrate a degree of generalization.
Related papers
- From Imitation to Refinement -- Residual RL for Precise Visual Assembly [19.9786629249219]
Reinforcement learning allows policies to acquire locally corrective behaviors through task reward supervision and exploration.
This paper explores the use of RL fine-tuning to improve upon BC-trained policies in precise manipulation tasks.
We propose training residual policies on top of frozen BC-trained diffusion models using standard policy gradient methods and sparse rewards.
arXiv Detail & Related papers (2024-07-23T17:44:54Z) - Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization [17.729842629392742]
We study a Reinforcement Learning problem in which we are given a set of trajectories collected with K baseline policies.
The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space.
arXiv Detail & Related papers (2024-03-28T14:34:02Z) - Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches.
We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment.
Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - AGPNet -- Autonomous Grading Policy Network [0.5232537118394002]
We formalize the problem as a Markov Decision Process and design a simulation which demonstrates agent-environment interactions.
We use methods from reinforcement learning, behavior cloning and contrastive learning to train a hybrid policy.
Our trained agent, AGPNet, reaches human-level performance and outperforms current state-of-the-art machine learning methods for the autonomous grading task.
arXiv Detail & Related papers (2021-12-20T21:44:21Z) - Learning Multi-Objective Curricula for Deep Reinforcement Learning [55.27879754113767]
Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL)
In this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula.
In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum.
arXiv Detail & Related papers (2021-10-06T19:30:25Z) - Learn Dynamic-Aware State Embedding for Transfer Learning [0.8756822885568589]
We consider the setting where all tasks (MDPs) share the same environment dynamic except reward function.
In this setting, the MDP dynamic is a good knowledge to transfer, which can be inferred by uniformly random policy.
We observe that the binary MDP dynamic can be inferred from trajectories of any policy which avoids the need of uniform random policy.
arXiv Detail & Related papers (2021-01-06T19:07:31Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z) - PFPN: Continuous Control of Physically Simulated Characters using
Particle Filtering Policy Network [0.9137554315375919]
We propose a framework that considers a particle-based action policy as a substitute for Gaussian policies.
We demonstrate the applicability of our approach on various motion capture imitation tasks.
arXiv Detail & Related papers (2020-03-16T00:35:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.