How the level sampling process impacts zero-shot generalisation in deep
reinforcement learning
- URL: http://arxiv.org/abs/2310.03494v2
- Date: Mon, 11 Dec 2023 00:57:18 GMT
- Title: How the level sampling process impacts zero-shot generalisation in deep
reinforcement learning
- Authors: Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas and
Stefano V. Albrecht
- Abstract summary: Key limitation preventing wider adoption of autonomous agents trained via deep reinforcement learning is their limited ability to generalise to new environments.
We investigate how a non-uniform sampling strategy of individual environment instances affects the zero-shot generalisation ability of RL agents.
- Score: 12.79149059358717
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key limitation preventing the wider adoption of autonomous agents trained
via deep reinforcement learning (RL) is their limited ability to generalise to
new environments, even when these share similar characteristics with
environments encountered during training. In this work, we investigate how a
non-uniform sampling strategy of individual environment instances, or levels,
affects the zero-shot generalisation (ZSG) ability of RL agents, considering
two failure modes: overfitting and over-generalisation. As a first step, we
measure the mutual information (MI) between the agent's internal representation
and the set of training levels, which we find to be well-correlated to instance
overfitting. In contrast to uniform sampling, adaptive sampling strategies
prioritising levels based on their value loss are more effective at maintaining
lower MI, which provides a novel theoretical justification for this class of
techniques. We then turn our attention to unsupervised environment design (UED)
methods, which adaptively generate new training levels and minimise MI more
effectively than methods sampling from a fixed set. However, we find UED
methods significantly shift the training distribution, resulting in
over-generalisation and worse ZSG performance over the distribution of
interest. To prevent both instance overfitting and over-generalisation, we
introduce self-supervised environment design (SSED). SSED generates levels
using a variational autoencoder, effectively reducing MI while minimising the
shift with the distribution of interest, and leads to statistically significant
improvements in ZSG over fixed-set level sampling strategies and UED methods.
Related papers
- COMBO-Grasp: Learning Constraint-Based Manipulation for Bimanual Occluded Grasping [56.907940167333656]
Occluded robot grasping is where the desired grasp poses are kinematically infeasible due to environmental constraints such as surface collisions.
Traditional robot manipulation approaches struggle with the complexity of non-prehensile or bimanual strategies commonly used by humans.
We introduce Constraint-based Manipulation for Bimanual Occluded Grasping (COMBO-Grasp), a learning-based approach which leverages two coordinated policies.
arXiv Detail & Related papers (2025-02-12T01:31:01Z) - Hybrid Classification-Regression Adaptive Loss for Dense Object Detection [19.180514552400883]
We propose a Hybrid Classification-Regression Adaptive Loss, termed as HCRAL.
We introduce the Residual of Classification and IoU (RCI) module for cross-task supervision, addressing task inconsistencies, and the Conditioning Factor (CF) to focus on difficult-to-train samples within each task.
We also introduce a new strategy named Expanded Adaptive Training Sample Selection (EATSS) to provide additional samples that exhibit classification and regression inconsistencies.
arXiv Detail & Related papers (2024-08-30T10:31:39Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design [11.922951794283168]
In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents.
We discover that for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data.
We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance.
To prevent both overfitting and distributional shift, we introduce data-regularised environment design (D
arXiv Detail & Related papers (2024-02-05T19:47:45Z) - Attacks on Robust Distributed Learning Schemes via Sensitivity Curve
Maximization [37.464005524259356]
We present a new attack based on sensitivity of curve (SCM)
We demonstrate that it is able to disrupt existing robust aggregation schemes by injecting small but effective perturbations.
arXiv Detail & Related papers (2023-04-27T08:41:57Z) - Generalized Inter-class Loss for Gait Recognition [11.15855312510806]
Gait recognition is a unique biometric technique that can be performed at a long distance non-cooperatively.
Previous gait works focus more on minimizing the intra-class variance while ignoring the significance in constraining inter-class variance.
We propose a generalized inter-class loss which resolves the inter-class variance from both sample-level feature distribution and class-level feature distribution.
arXiv Detail & Related papers (2022-10-13T06:44:53Z) - Improving Generalization in Federated Learning by Seeking Flat Minima [23.937135834522145]
Models trained in federated settings often suffer from degraded performances and fail at generalizing.
In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum.
Motivated by prior studies connecting the sharpness of the loss surface and the generalization gap, we show that i) training clients locally with Sharpness-Aware Minimization (SAM) or its adaptive version (ASAM) on the server-side can substantially improve generalization.
arXiv Detail & Related papers (2022-03-22T16:01:04Z) - Revisiting Deep Semi-supervised Learning: An Empirical Distribution
Alignment Framework and Its Generalization Bound [97.93945601881407]
We propose a new deep semi-supervised learning framework called Semi-supervised Learning by Empirical Distribution Alignment (SLEDA)
We show the generalization error of semi-supervised learning can be effectively bounded by minimizing the training error on labeled data.
Building upon our new framework and the theoretical bound, we develop a simple and effective deep semi-supervised learning method called Augmented Distribution Alignment Network (ADA-Net)
arXiv Detail & Related papers (2022-03-13T11:59:52Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Hybrid Dynamic Contrast and Probability Distillation for Unsupervised
Person Re-Id [109.1730454118532]
Unsupervised person re-identification (Re-Id) has attracted increasing attention due to its practical application in the read-world video surveillance system.
We present the hybrid dynamic cluster contrast and probability distillation algorithm.
It formulates the unsupervised Re-Id problem into an unified local-to-global dynamic contrastive learning and self-supervised probability distillation framework.
arXiv Detail & Related papers (2021-09-29T02:56:45Z) - Rethinking Sampling Strategies for Unsupervised Person Re-identification [59.47536050785886]
We analyze the reasons for the performance differences between various sampling strategies under the same framework and loss function.
Group sampling is proposed, which gathers samples from the same class into groups.
Experiments on Market-1501, DukeMTMC-reID and MSMT17 show that group sampling achieves performance comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-07-07T05:39:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.