Related papers: Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity

Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity

URL: http://arxiv.org/abs/2411.04466v1
Date: Thu, 07 Nov 2024 06:27:12 GMT
Title: Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity
Authors: Robby Costales, Stefanos Nikolaidis,
Abstract summary: DIVA is an evolutionary approach for generating diverse training tasks in complex, open-ended simulators. Our empirical results showcase DIVA's unique ability to overcome complex parameterizations and successfully train adaptive agent behavior.
Score: 10.402855891273346
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The wider application of end-to-end learning methods to embodied decision-making domains remains bottlenecked by their reliance on a superabundance of training data representative of the target domain. Meta-reinforcement learning (meta-RL) approaches abandon the aim of zero-shot generalization--the goal of standard reinforcement learning (RL)--in favor of few-shot adaptation, and thus hold promise for bridging larger generalization gaps. While learning this meta-level adaptive behavior still requires substantial data, efficient environment simulators approaching real-world complexity are growing in prevalence. Even so, hand-designing sufficiently diverse and numerous simulated training tasks for these complex domains is prohibitively labor-intensive. Domain randomization (DR) and procedural generation (PG), offered as solutions to this problem, require simulators to possess carefully-defined parameters which directly translate to meaningful task diversity--a similarly prohibitive assumption. In this work, we present DIVA, an evolutionary approach for generating diverse training tasks in such complex, open-ended simulators. Like unsupervised environment design (UED) methods, DIVA can be applied to arbitrary parameterizations, but can additionally incorporate realistically-available domain knowledge--thus inheriting the flexibility and generality of UED, and the supervised structure embedded in well-designed simulators exploited by DR and PG. Our empirical results showcase DIVA's unique ability to overcome complex parameterizations and successfully train adaptive agent behavior, far outperforming competitive baselines from prior literature. These findings highlight the potential of such semi-supervised environment design (SSED) approaches, of which DIVA is the first humble constituent, to enable training in realistic simulated domains, and produce more robust and capable adaptive agents.

Related papers

Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
Adversarial Data Augmentation for Single Domain Generalization via Lyapunov Exponent-Guided Optimization [6.619253289031494]
Single Domain Generalization aims to develop models capable of generalizing to unseen target domains using only one source domain.<n>We propose LEAwareSGD, a novel Lyapunov Exponent (LE)-guided optimization approach inspired by dynamical systems theory.<n>Experiments on PACS, OfficeHome, and DomainNet demonstrate that LEAwareSGD yields substantial generalization gains.
arXiv Detail & Related papers (2025-07-06T09:03:08Z)
MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR [59.83547898874152]
We introduce a sample-efficient, two-stage adaptation approach that integrates self-supervised learning with semi-supervised techniques.<n>MSDA is designed to enhance the robustness and generalization of ASR models.<n>We demonstrate that Meta PL can be applied effectively to ASR tasks, achieving state-of-the-art results.
arXiv Detail & Related papers (2025-05-30T14:46:05Z)
Mind the Gap: Towards Generalizable Autonomous Penetration Testing via Domain Randomization and Meta-Reinforcement Learning [15.619925926862235]
GAP is a generalizable autonomous pentesting framework. It aims to realizes efficient policy training in realistic environments. It also trains agents capable of drawing inferences about other cases from one instance.
arXiv Detail & Related papers (2024-12-05T11:24:27Z)
LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots [20.715834172041763]
We propose a lifelong policy adaptation framework named LoopSR. It reconstructs the real-world environments back in simulation for further improvement. By leveraging the continual training, LoopSR achieves superior data efficiency compared with strong baselines.
arXiv Detail & Related papers (2024-09-26T16:02:25Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Learning Curricula in Open-Ended Worlds [17.138779075998084]
This thesis develops a class of methods called Unsupervised Environment Design (UED) Given an environment design space, UED automatically generates an infinite sequence or curriculum of training environments. The findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness.
arXiv Detail & Related papers (2023-12-03T16:44:00Z)
INTAGS: Interactive Agent-Guided Simulation [4.04638613278729]
In many applications involving multi-agent system (MAS), it is imperative to test an experimental (Exp) autonomous agent in a high-fidelity simulator prior to its deployment to production. We propose a metric to distinguish between real and synthetic multi-agent systems, which is evaluated through the live interaction between the Exp and BG agents. We show that using INTAGS to calibrate the simulator can generate more realistic market data compared to the state-of-the-art conditional Wasserstein Generative Adversarial Network approach.
arXiv Detail & Related papers (2023-09-04T19:56:18Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Consistency Regularization for Generalizable Source-free Domain Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset. Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets. We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
Heterogeneous Domain Adaptation and Equipment Matching: DANN-based Alignment with Cyclic Supervision (DBACS) [3.4519649635864584]
This work introduces the Domain Adaptation Neural Network with Cyclic Supervision (DBACS) approach. DBACS addresses the issue of model generalization through domain adaptation, specifically for heterogeneous data. This work also includes subspace alignment and a multi-view learning that deals with heterogeneous representations.
arXiv Detail & Related papers (2023-01-03T10:56:25Z)
One-Shot Domain Adaptive and Generalizable Semantic Segmentation with Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data. Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation. We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z)
Reinforcement Learning for Adaptive Mesh Refinement [63.7867809197671]
We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning to train refinement policies directly from simulation. The model sizes of these policy architectures are independent of the mesh size and hence scale to arbitrarily large and complex simulations.
arXiv Detail & Related papers (2021-03-01T22:55:48Z)
Zero-Shot Reinforcement Learning with Deep Attention Convolutional Neural Networks [12.282277258055542]
We show that a deep attention convolutional neural network (DACNN) with specific visual sensor configuration performs as well as training on a dataset with high domain and parameter variation at lower computational complexity. Our new architecture adapts perception with respect to the control objective, resulting in zero-shot learning without pre-training a perception network.
arXiv Detail & Related papers (2020-01-02T19:41:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.