Stay Alive with Many Options: A Reinforcement Learning Approach for
Autonomous Navigation
- URL: http://arxiv.org/abs/2102.00168v1
- Date: Sat, 30 Jan 2021 06:55:35 GMT
- Title: Stay Alive with Many Options: A Reinforcement Learning Approach for
Autonomous Navigation
- Authors: Ambedkar Dukkipati, Rajarshi Banerjee, Ranga Shaarad Ayyagari, Dhaval
Parmar Udaybhai
- Abstract summary: We introduce an alternative approach to sequentially learn such skills without using an overarching hierarchical policy.
We demonstrate the utility of our approach in a simulated 3D navigation environment which we have built.
- Score: 5.811502603310248
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical reinforcement learning approaches learn policies based on
hierarchical decision structures. However, training such methods in practice
may lead to poor generalization, with either sub-policies executing actions for
too few time steps or devolving into a single policy altogether. In our work,
we introduce an alternative approach to sequentially learn such skills without
using an overarching hierarchical policy, in the context of environments in
which an objective of the agent is to prolong the episode for as long as
possible, or in other words `stay alive'. We demonstrate the utility of our
approach in a simulated 3D navigation environment which we have built. We show
that our method outperforms prior methods such as Soft Actor Critic and Soft
Option Critic on our environment, as well as the Atari River Raid environment.
Related papers
- Language-Conditioned Semantic Search-Based Policy for Robotic
Manipulation Tasks [2.1332830068386217]
We propose a language-conditioned semantic search-based method to produce an online search-based policy.
Our approach surpasses the performance of the baselines on the CALVIN benchmark and exhibits strong zero-shot adaptation capabilities.
arXiv Detail & Related papers (2023-12-10T16:17:00Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Unsupervised Reinforcement Learning in Multiple Environments [37.5349071806395]
We address the problem of unsupervised reinforcement learning in a class of multiple environments.
We present a policy gradient algorithm, $alpha$MEPOL, to optimize the introduced objective through mediated interactions with the class.
We show that reinforcement learning greatly benefits from the pre-trained exploration strategy.
arXiv Detail & Related papers (2021-12-16T09:54:37Z) - Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z) - Non-Stationary Off-Policy Optimization [50.41335279896062]
We study the novel problem of off-policy optimization in piecewise-stationary contextual bandits.
In the offline learning phase, we partition logged data into categorical latent states and learn a near-optimal sub-policy for each state.
In the online deployment phase, we adaptively switch between the learned sub-policies based on their performance.
arXiv Detail & Related papers (2020-06-15T09:16:09Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.