How the level sampling process impacts zero-shot generalisation in deep
reinforcement learning
- URL: http://arxiv.org/abs/2310.03494v2
- Date: Mon, 11 Dec 2023 00:57:18 GMT
- Title: How the level sampling process impacts zero-shot generalisation in deep
reinforcement learning
- Authors: Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas and
Stefano V. Albrecht
- Abstract summary: Key limitation preventing wider adoption of autonomous agents trained via deep reinforcement learning is their limited ability to generalise to new environments.
We investigate how a non-uniform sampling strategy of individual environment instances affects the zero-shot generalisation ability of RL agents.
- Score: 12.79149059358717
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key limitation preventing the wider adoption of autonomous agents trained
via deep reinforcement learning (RL) is their limited ability to generalise to
new environments, even when these share similar characteristics with
environments encountered during training. In this work, we investigate how a
non-uniform sampling strategy of individual environment instances, or levels,
affects the zero-shot generalisation (ZSG) ability of RL agents, considering
two failure modes: overfitting and over-generalisation. As a first step, we
measure the mutual information (MI) between the agent's internal representation
and the set of training levels, which we find to be well-correlated to instance
overfitting. In contrast to uniform sampling, adaptive sampling strategies
prioritising levels based on their value loss are more effective at maintaining
lower MI, which provides a novel theoretical justification for this class of
techniques. We then turn our attention to unsupervised environment design (UED)
methods, which adaptively generate new training levels and minimise MI more
effectively than methods sampling from a fixed set. However, we find UED
methods significantly shift the training distribution, resulting in
over-generalisation and worse ZSG performance over the distribution of
interest. To prevent both instance overfitting and over-generalisation, we
introduce self-supervised environment design (SSED). SSED generates levels
using a variational autoencoder, effectively reducing MI while minimising the
shift with the distribution of interest, and leads to statistically significant
improvements in ZSG over fixed-set level sampling strategies and UED methods.
Related papers
- Hybrid Classification-Regression Adaptive Loss for Dense Object Detection [19.180514552400883]
We propose a Hybrid Classification-Regression Adaptive Loss, termed as HCRAL.
We introduce the Residual of Classification and IoU (RCI) module for cross-task supervision, addressing task inconsistencies, and the Conditioning Factor (CF) to focus on difficult-to-train samples within each task.
We also introduce a new strategy named Expanded Adaptive Training Sample Selection (EATSS) to provide additional samples that exhibit classification and regression inconsistencies.
arXiv Detail & Related papers (2024-08-30T10:31:39Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design [11.922951794283168]
In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents.
We discover that for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data.
We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance.
To prevent both overfitting and distributional shift, we introduce data-regularised environment design (D
arXiv Detail & Related papers (2024-02-05T19:47:45Z) - Unilaterally Aggregated Contrastive Learning with Hierarchical
Augmentation for Anomaly Detection [64.50126371767476]
We propose Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation (UniCon-HA)
We explicitly encourage the concentration of inliers and the dispersion of virtual outliers via supervised and unsupervised contrastive losses.
Our method is evaluated under three AD settings including unlabeled one-class, unlabeled multi-class, and labeled multi-class.
arXiv Detail & Related papers (2023-08-20T04:01:50Z) - Attacks on Robust Distributed Learning Schemes via Sensitivity Curve
Maximization [37.464005524259356]
We present a new attack based on sensitivity of curve (SCM)
We demonstrate that it is able to disrupt existing robust aggregation schemes by injecting small but effective perturbations.
arXiv Detail & Related papers (2023-04-27T08:41:57Z) - Generalized Inter-class Loss for Gait Recognition [11.15855312510806]
Gait recognition is a unique biometric technique that can be performed at a long distance non-cooperatively.
Previous gait works focus more on minimizing the intra-class variance while ignoring the significance in constraining inter-class variance.
We propose a generalized inter-class loss which resolves the inter-class variance from both sample-level feature distribution and class-level feature distribution.
arXiv Detail & Related papers (2022-10-13T06:44:53Z) - Improving Generalization in Federated Learning by Seeking Flat Minima [23.937135834522145]
Models trained in federated settings often suffer from degraded performances and fail at generalizing.
In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum.
Motivated by prior studies connecting the sharpness of the loss surface and the generalization gap, we show that i) training clients locally with Sharpness-Aware Minimization (SAM) or its adaptive version (ASAM) on the server-side can substantially improve generalization.
arXiv Detail & Related papers (2022-03-22T16:01:04Z) - Revisiting Deep Semi-supervised Learning: An Empirical Distribution
Alignment Framework and Its Generalization Bound [97.93945601881407]
We propose a new deep semi-supervised learning framework called Semi-supervised Learning by Empirical Distribution Alignment (SLEDA)
We show the generalization error of semi-supervised learning can be effectively bounded by minimizing the training error on labeled data.
Building upon our new framework and the theoretical bound, we develop a simple and effective deep semi-supervised learning method called Augmented Distribution Alignment Network (ADA-Net)
arXiv Detail & Related papers (2022-03-13T11:59:52Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Hybrid Dynamic Contrast and Probability Distillation for Unsupervised
Person Re-Id [109.1730454118532]
Unsupervised person re-identification (Re-Id) has attracted increasing attention due to its practical application in the read-world video surveillance system.
We present the hybrid dynamic cluster contrast and probability distillation algorithm.
It formulates the unsupervised Re-Id problem into an unified local-to-global dynamic contrastive learning and self-supervised probability distillation framework.
arXiv Detail & Related papers (2021-09-29T02:56:45Z) - Adaptive Adversarial Logits Pairing [65.51670200266913]
An adversarial training solution Adversarial Logits Pairing (ALP) tends to rely on fewer high-contribution features compared with vulnerable ones.
Motivated by these observations, we design an Adaptive Adversarial Logits Pairing (AALP) solution by modifying the training process and training target of ALP.
AALP consists of an adaptive feature optimization module with Guided Dropout to systematically pursue fewer high-contribution features.
arXiv Detail & Related papers (2020-05-25T03:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.