C-Learning: Horizon-Aware Cumulative Accessibility Estimation
- URL: http://arxiv.org/abs/2011.12363v3
- Date: Tue, 26 Jan 2021 03:05:38 GMT
- Title: C-Learning: Horizon-Aware Cumulative Accessibility Estimation
- Authors: Panteha Naderian, Gabriel Loaiza-Ganem, Harry J. Braviner, Anthony L.
Caterini, Jesse C. Cresswell, Tong Li, Animesh Garg
- Abstract summary: We introduce the concept of cumulative accessibility functions, which measure the reachability of a goal from a given state within a specified horizon.
We show that these functions obey a recurrence relation, which enables learning from offline interactions.
We evaluate our approach on a set of multi-goal discrete and continuous control tasks.
- Score: 29.588146016880284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-goal reaching is an important problem in reinforcement learning needed
to achieve algorithmic generalization. Despite recent advances in this field,
current algorithms suffer from three major challenges: high sample complexity,
learning only a single way of reaching the goals, and difficulties in solving
complex motion planning tasks. In order to address these limitations, we
introduce the concept of cumulative accessibility functions, which measure the
reachability of a goal from a given state within a specified horizon. We show
that these functions obey a recurrence relation, which enables learning from
offline interactions. We also prove that optimal cumulative accessibility
functions are monotonic in the planning horizon. Additionally, our method can
trade off speed and reliability in goal-reaching by suggesting multiple paths
to a single goal depending on the provided horizon. We evaluate our approach on
a set of multi-goal discrete and continuous control tasks. We show that our
method outperforms state-of-the-art goal-reaching algorithms in success rate,
sample complexity, and path optimality. Our code is available at
https://github.com/layer6ai-labs/CAE, and additional visualizations can be
found at https://sites.google.com/view/learning-cae/.
Related papers
- Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling [0.0]
This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA)
Numerical experiments demonstrate that our method is a competitive general-purpose solver, achieving performance comparable to iSCO and learning-based solvers.
arXiv Detail & Related papers (2024-09-02T12:55:27Z) - HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data.
We show how this hierarchical decomposition makes our method robust to noise in the estimated value function.
Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Automatic Goal Generation using Dynamical Distance Learning [5.797847756967884]
Reinforcement Learning (RL) agents can learn to solve complex sequential decision making tasks by interacting with the environment.
In the field of multi-goal RL, where agents are required to reach multiple goals to solve complex tasks, improving sample efficiency can be especially challenging.
We propose a method for automatic goal generation using a dynamical distance function (DDF) in a self-supervised fashion.
arXiv Detail & Related papers (2021-11-07T16:23:56Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - Deep Reinforcement Learning for Complex Manipulation Tasks with Sparse
Feedback [0.0]
Hindsight Experience Replay (HER) is a multi-goal reinforcement learning algorithm.
We present three algorithms based on the existing HER algorithm that improves its performances.
arXiv Detail & Related papers (2020-01-12T07:22:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.