Metric Residual Networks for Sample Efficient Goal-conditioned
Reinforcement Learning
- URL: http://arxiv.org/abs/2208.08133v1
- Date: Wed, 17 Aug 2022 08:04:41 GMT
- Title: Metric Residual Networks for Sample Efficient Goal-conditioned
Reinforcement Learning
- Authors: Bo Liu, Yihao Feng, Qiang Liu, Peter Stone
- Abstract summary: Goal-conditioned reinforcement learning (GCRL) has a wide range of potential real-world applications.
Sample efficiency is of utmost importance for GCRL since, by default, the agent is only rewarded when it reaches its goal.
We introduce a novel neural architecture for GCRL that achieves significantly better sample efficiency than the commonly-used monolithic network architecture.
- Score: 52.59242013527014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Goal-conditioned reinforcement learning (GCRL) has a wide range of potential
real-world applications, including manipulation and navigation problems in
robotics. Especially in such robotics task, sample efficiency is of the utmost
importance for GCRL since, by default, the agent is only rewarded when it
reaches its goal. While several methods have been proposed to improve the
sample efficiency of GCRL, one relatively under-studied approach is the design
of neural architectures to support sample efficiency. In this work, we
introduce a novel neural architecture for GCRL that achieves significantly
better sample efficiency than the commonly-used monolithic network
architecture. They key insight is that the optimal action value function Q^*(s,
a, g) must satisfy the triangle inequality in a specific sense. Furthermore, we
introduce the metric residual network (MRN) that deliberately decomposes the
action-value function Q(s,a,g) into the negated summation of a metric plus a
residual asymmetric component. MRN provably approximates any optimal
action-value function Q^*(s,a,g), thus making it a fitting neural architecture
for GCRL. We conduct comprehensive experiments across 12 standard benchmark
environments in GCRL. The empirical results demonstrate that MRN uniformly
outperforms other state-of-the-art GCRL neural architectures in terms of sample
efficiency.
Related papers
- Quasimetric Value Functions with Dense Rewards [1.6574413179773761]
We show that the key property underpinning a quasimetric, viz., the triangle inequality, is preserved under a dense reward setting.
Dense reward functions that satisfy this condition can only improve, never worsen, sample complexity.
This opens up opportunities to train efficient neural architectures with dense rewards, compounding their benefits to sample complexity.
arXiv Detail & Related papers (2024-09-13T11:26:05Z) - Stop Regressing: Training Value Functions via Classification for
Scalable Deep RL [109.44370201929246]
We show that training value functions with categorical cross-entropy improves performance and scalability in a variety of domains.
These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers.
arXiv Detail & Related papers (2024-03-06T18:55:47Z) - SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning [33.125187822259186]
Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions.
We present a novel approach to GCRL under a new lens of mixture-distribution matching, leading to our discriminator-free method: SMORe.
arXiv Detail & Related papers (2023-11-03T16:19:33Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning [73.80728148866906]
Quasimetric Reinforcement Learning (QRL) is a new RL method that utilizes quasimetric models to learn optimal value functions.
On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance.
arXiv Detail & Related papers (2023-04-03T17:59:58Z) - Provably Efficient Offline Goal-Conditioned Reinforcement Learning with
General Function Approximation and Single-Policy Concentrability [11.786486763236104]
Goal-conditioned reinforcement learning (GCRL) refers to learning general-purpose skills that aim to reach diverse goals.
offline GCRL only requires purely pre-collected datasets to perform training tasks.
We show that a modified offline GCRL algorithm is both provably efficient with general function approximation and single-policy concentrability.
arXiv Detail & Related papers (2023-02-07T22:04:55Z) - AIO-P: Expanding Neural Performance Predictors Beyond Image
Classification [22.743278613519152]
We propose a novel All-in-One Predictor (AIO-P) to pretrain neural predictors on architecture examples.
AIO-P can achieve Mean Absolute Error (MAE) and Spearman's Rank Correlation (SRCC) below 1% and above 0.5, respectively.
arXiv Detail & Related papers (2022-11-30T18:30:41Z) - A Novel Genetic Algorithm with Hierarchical Evaluation Strategy for
Hyperparameter Optimisation of Graph Neural Networks [7.139436410105177]
This research presents a novel genetic algorithm with a hierarchical evaluation strategy (HESGA)
The proposed hierarchical strategy uses the fast evaluation in a lower level for recommending candidates to a higher level, where the full evaluation will act as a final assessor to maintain a group of elite individuals.
arXiv Detail & Related papers (2021-01-22T19:19:59Z) - FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking.
We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints.
FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z) - Gradient Centralization: A New Optimization Technique for Deep Neural
Networks [74.935141515523]
gradient centralization (GC) operates directly on gradients by centralizing the gradient vectors to have zero mean.
GC can be viewed as a projected gradient descent method with a constrained loss function.
GC is very simple to implement and can be easily embedded into existing gradient based DNNs with only one line of code.
arXiv Detail & Related papers (2020-04-03T10:25:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.