Related papers: How the level sampling process impacts zero-shot generalisation in deep reinforcement learning

How the level sampling process impacts zero-shot generalisation in deep reinforcement learning

URL: http://arxiv.org/abs/2310.03494v2
Date: Mon, 11 Dec 2023 00:57:18 GMT
Title: How the level sampling process impacts zero-shot generalisation in deep reinforcement learning
Authors: Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas and Stefano V. Albrecht
Abstract summary: Key limitation preventing wider adoption of autonomous agents trained via deep reinforcement learning is their limited ability to generalise to new environments. We investigate how a non-uniform sampling strategy of individual environment instances affects the zero-shot generalisation ability of RL agents.
Score: 12.79149059358717
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical justification for this class of techniques. We then turn our attention to unsupervised environment design (UED) methods, which adaptively generate new training levels and minimise MI more effectively than methods sampling from a fixed set. However, we find UED methods significantly shift the training distribution, resulting in over-generalisation and worse ZSG performance over the distribution of interest. To prevent both instance overfitting and over-generalisation, we introduce self-supervised environment design (SSED). SSED generates levels using a variational autoencoder, effectively reducing MI while minimising the shift with the distribution of interest, and leads to statistically significant improvements in ZSG over fixed-set level sampling strategies and UED methods.

Related papers

Fast Adaptation with Behavioral Foundation Models [82.34700481726951]
Unsupervised zero-shot reinforcement learning has emerged as a powerful paradigm for pretraining behavioral foundation models. Despite promising results, zero-shot policies are often suboptimal due to errors induced by the unsupervised training process. We propose fast adaptation strategies that search in the low-dimensional task-embedding space of the pre-trained BFM to rapidly improve the performance of its zero-shot policies.
arXiv Detail & Related papers (2025-04-10T16:14:17Z)
Hybrid Classification-Regression Adaptive Loss for Dense Object Detection [19.180514552400883]
We propose a Hybrid Classification-Regression Adaptive Loss, termed as HCRAL. We introduce the Residual of Classification and IoU (RCI) module for cross-task supervision, addressing task inconsistencies, and the Conditioning Factor (CF) to focus on difficult-to-train samples within each task. We also introduce a new strategy named Expanded Adaptive Training Sample Selection (EATSS) to provide additional samples that exhibit classification and regression inconsistencies.
arXiv Detail & Related papers (2024-08-30T10:31:39Z)
Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data. Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets. We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z)
DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design [11.922951794283168]
In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents. We discover that for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data. We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance. To prevent both overfitting and distributional shift, we introduce data-regularised environment design (D
arXiv Detail & Related papers (2024-02-05T19:47:45Z)
Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation for Anomaly Detection [64.50126371767476]
We propose Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation (UniCon-HA) We explicitly encourage the concentration of inliers and the dispersion of virtual outliers via supervised and unsupervised contrastive losses. Our method is evaluated under three AD settings including unlabeled one-class, unlabeled multi-class, and labeled multi-class.
arXiv Detail & Related papers (2023-08-20T04:01:50Z)
Attacks on Robust Distributed Learning Schemes via Sensitivity Curve Maximization [37.464005524259356]
We present a new attack based on sensitivity of curve (SCM) We demonstrate that it is able to disrupt existing robust aggregation schemes by injecting small but effective perturbations.
arXiv Detail & Related papers (2023-04-27T08:41:57Z)
Generalized Inter-class Loss for Gait Recognition [11.15855312510806]
Gait recognition is a unique biometric technique that can be performed at a long distance non-cooperatively. Previous gait works focus more on minimizing the intra-class variance while ignoring the significance in constraining inter-class variance. We propose a generalized inter-class loss which resolves the inter-class variance from both sample-level feature distribution and class-level feature distribution.
arXiv Detail & Related papers (2022-10-13T06:44:53Z)
Improving Generalization in Federated Learning by Seeking Flat Minima [23.937135834522145]
Models trained in federated settings often suffer from degraded performances and fail at generalizing. In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum. Motivated by prior studies connecting the sharpness of the loss surface and the generalization gap, we show that i) training clients locally with Sharpness-Aware Minimization (SAM) or its adaptive version (ASAM) on the server-side can substantially improve generalization.
arXiv Detail & Related papers (2022-03-22T16:01:04Z)
Revisiting Deep Semi-supervised Learning: An Empirical Distribution Alignment Framework and Its Generalization Bound [97.93945601881407]
We propose a new deep semi-supervised learning framework called Semi-supervised Learning by Empirical Distribution Alignment (SLEDA) We show the generalization error of semi-supervised learning can be effectively bounded by minimizing the training error on labeled data. Building upon our new framework and the theoretical bound, we develop a simple and effective deep semi-supervised learning method called Augmented Distribution Alignment Network (ADA-Net)
arXiv Detail & Related papers (2022-03-13T11:59:52Z)
Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains. We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z)
Hybrid Dynamic Contrast and Probability Distillation for Unsupervised Person Re-Id [109.1730454118532]
Unsupervised person re-identification (Re-Id) has attracted increasing attention due to its practical application in the read-world video surveillance system. We present the hybrid dynamic cluster contrast and probability distillation algorithm. It formulates the unsupervised Re-Id problem into an unified local-to-global dynamic contrastive learning and self-supervised probability distillation framework.
arXiv Detail & Related papers (2021-09-29T02:56:45Z)
Rethinking Sampling Strategies for Unsupervised Person Re-identification [59.47536050785886]
We analyze the reasons for the performance differences between various sampling strategies under the same framework and loss function. Group sampling is proposed, which gathers samples from the same class into groups. Experiments on Market-1501, DukeMTMC-reID and MSMT17 show that group sampling achieves performance comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-07-07T05:39:58Z)
Adaptive Adversarial Logits Pairing [65.51670200266913]
An adversarial training solution Adversarial Logits Pairing (ALP) tends to rely on fewer high-contribution features compared with vulnerable ones. Motivated by these observations, we design an Adaptive Adversarial Logits Pairing (AALP) solution by modifying the training process and training target of ALP. AALP consists of an adaptive feature optimization module with Guided Dropout to systematically pursue fewer high-contribution features.
arXiv Detail & Related papers (2020-05-25T03:12:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.