An Optimisation Framework for Unsupervised Environment Design
- URL: http://arxiv.org/abs/2505.20659v2
- Date: Wed, 09 Jul 2025 09:50:34 GMT
- Title: An Optimisation Framework for Unsupervised Environment Design
- Authors: Nathan Monette, Alistair Letcher, Michael Beukman, Matthew T. Jackson, Alexander Rutherford, Alexander D. Goldie, Jakob N. Foerster,
- Abstract summary: unsupervised environment design (UED) aims to maximise agent's general robustness.<n>We provide a provably convergent algorithm in the zero-sum setting.<n>We empirically verify the efficacy of our method.
- Score: 88.29733214939544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For reinforcement learning agents to be deployed in high-risk settings, they must achieve a high level of robustness to unfamiliar scenarios. One method for improving robustness is unsupervised environment design (UED), a suite of methods aiming to maximise an agent's generalisability across configurations of an environment. In this work, we study UED from an optimisation perspective, providing stronger theoretical guarantees for practical settings than prior work. Whereas previous methods relied on guarantees if they reach convergence, our framework employs a nonconvex-strongly-concave objective for which we provide a provably convergent algorithm in the zero-sum setting. We empirically verify the efficacy of our method, outperforming prior methods in a number of environments with varying difficulties.
Related papers
- Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence [0.562479170374811]
We introduce a novel problem setting in bandit optimization that addresses maximizing expected reward and minimizing associated uncertainty.<n>We propose a unified meta-budgetalgorithmic framework capable of operating under both fixed-confidence and fixed-optimal regimes.<n>Our approach outperforms existing methods in terms of both accuracy and sample efficiency.
arXiv Detail & Related papers (2025-06-27T14:21:03Z) - Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time [52.230936493691985]
We propose SITAlign, an inference framework that addresses the multifaceted nature of alignment by maximizing a primary objective while satisfying threshold-based constraints on secondary criteria.<n>We provide theoretical insights by deriving sub-optimality bounds of our satisficing based inference alignment approach.
arXiv Detail & Related papers (2025-05-29T17:56:05Z) - Certifiably Robust Policies for Uncertain Parametric Environments [57.2416302384766]
We propose a framework based on parametric Markov decision processes (MDPs) with unknown distributions over parameters.<n>We learn and analyse IMDPs for a set of unknown sample environments induced by parameters.<n>We show that our approach produces tight bounds on a policy's performance with high confidence.
arXiv Detail & Related papers (2024-08-06T10:48:15Z) - Data-Driven Goal Recognition Design for General Behavioral Agents [14.750023724230774]
We introduce a data-driven approach to goal recognition design that can account for agents with general behavioral models.
We propose a gradient-based optimization framework that accommodates various constraints to optimize decision-making environments.
arXiv Detail & Related papers (2024-04-03T20:38:22Z) - End-to-end Conditional Robust Optimization [6.363653898208231]
Conditional Robust Optimization (CRO) combines uncertainty quantification with robust optimization to promote safety and reliability in high stake applications.
We propose a novel end-to-end approach to train a CRO model in a way that accounts for both the empirical risk of the prescribed decisions and the quality of conditional coverage of the contextual uncertainty set that supports them.
We show that the proposed training algorithms produce decisions that outperform the traditional estimate then optimize approaches.
arXiv Detail & Related papers (2024-03-07T17:16:59Z) - Iterative Reachability Estimation for Safe Reinforcement Learning [23.942701020636882]
We propose a new framework, Reachability Estimation for Safe Policy Optimization (RESPO), for safety-constrained reinforcement learning (RL) environments.
In the feasible set where there exist violation-free policies, we optimize for rewards while maintaining persistent safety.
We evaluate the proposed methods on a diverse suite of safe RL environments from Safety Gym, PyBullet, and MuJoCo.
arXiv Detail & Related papers (2023-09-24T02:36:42Z) - Constrained Environment Optimization for Prioritized Multi-Agent
Navigation [11.473177123332281]
This paper aims to consider the environment as a decision variable in a system-level optimization problem.
We propose novel problems of unprioritized and prioritized environment optimization.
We show, through formal proofs, under which conditions the environment can change while guaranteeing completeness.
arXiv Detail & Related papers (2023-05-18T18:55:06Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Risk-Averse Model Uncertainty for Distributionally Robust Safe
Reinforcement Learning [3.9821399546174825]
We introduce a deep reinforcement learning framework for safe decision making in uncertain environments.
We provide robustness guarantees for this framework by showing it is equivalent to a specific class of distributionally robust safe reinforcement learning problems.
In experiments on continuous control tasks with safety constraints, we demonstrate that our framework produces robust performance and safety at deployment time across a range of perturbed test environments.
arXiv Detail & Related papers (2023-01-30T00:37:06Z) - Constrained Policy Optimization via Bayesian World Models [79.0077602277004]
LAMBDA is a model-based approach for policy optimization in safety critical tasks modeled via constrained Markov decision processes.
We demonstrate LAMBDA's state of the art performance on the Safety-Gym benchmark suite in terms of sample efficiency and constraint violation.
arXiv Detail & Related papers (2022-01-24T17:02:22Z) - On the Convergence and Robustness of Adversarial Training [134.25999006326916]
Adrial training with Project Gradient Decent (PGD) is amongst the most effective.
We propose a textitdynamic training strategy to increase the convergence quality of the generated adversarial examples.
Our theoretical and empirical results show the effectiveness of the proposed method.
arXiv Detail & Related papers (2021-12-15T17:54:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.