Heterogeneous Adversarial Play in Interactive Environments
- URL: http://arxiv.org/abs/2510.18407v1
- Date: Tue, 21 Oct 2025 08:29:59 GMT
- Title: Heterogeneous Adversarial Play in Interactive Environments
- Authors: Manjie Xu, Xinyi Yang, Jiayu Zhan, Wei Liang, Chi Zhang, Yixin Zhu,
- Abstract summary: Heterogeneous Adversarial Play (HAP) is an adversarial Automatic Curriculum Learning framework that formalizes teacher-student interactions as a minimax optimization.<n>Our framework achieves performance parity with SOTA baselines while generating curricula that enhance learning efficacy in both artificial agents and human subjects.
- Score: 15.718025074467453
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-play constitutes a fundamental paradigm for autonomous skill acquisition, whereby agents iteratively enhance their capabilities through self-directed environmental exploration. Conventional self-play frameworks exploit agent symmetry within zero-sum competitive settings, yet this approach proves inadequate for open-ended learning scenarios characterized by inherent asymmetry. Human pedagogical systems exemplify asymmetric instructional frameworks wherein educators systematically construct challenges calibrated to individual learners' developmental trajectories. The principal challenge resides in operationalizing these asymmetric, adaptive pedagogical mechanisms within artificial systems capable of autonomously synthesizing appropriate curricula without predetermined task hierarchies. Here we present Heterogeneous Adversarial Play (HAP), an adversarial Automatic Curriculum Learning framework that formalizes teacher-student interactions as a minimax optimization wherein task-generating instructor and problem-solving learner co-evolve through adversarial dynamics. In contrast to prevailing ACL methodologies that employ static curricula or unidirectional task selection mechanisms, HAP establishes a bidirectional feedback system wherein instructors continuously recalibrate task complexity in response to real-time learner performance metrics. Experimental validation across multi-task learning domains demonstrates that our framework achieves performance parity with SOTA baselines while generating curricula that enhance learning efficacy in both artificial agents and human subjects.
Related papers
- Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback [2.165723322157105]
This work presents a framework for online Continual Reinforcement Learning that enables automated adaptation during deployment.<n>The proposed method leverages world model prediction residuals to detect out-of-distribution events and automatically trigger finetuning.<n>The approach is validated on a variety of contemporary continuous control problems, including a quadruped robot in high-fidelity simulation.
arXiv Detail & Related papers (2026-03-04T13:07:42Z) - Human-Inspired Continuous Learning of Internal Reasoning Processes: Learning How to Think for Adaptive AI Systems [0.11844977816228043]
Internal reasoning processes are crucial for developing AI systems capable of sustained adaptation in dynamic real-world environments.<n>We propose a human-inspired continuous learning framework that unifies reasoning, action, reflection, and verification within a sequential reasoning model.
arXiv Detail & Related papers (2026-02-12T03:19:04Z) - Agentic Reasoning for Large Language Models [122.81018455095999]
Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making.<n>Large language models (LLMs) demonstrate strong reasoning capabilities in closed-world settings, but struggle in open-ended and dynamic environments.<n>Agentic reasoning marks a paradigm shift by reframing LLMs as autonomous agents that plan, act, and learn through continual interaction.
arXiv Detail & Related papers (2026-01-18T18:58:23Z) - From Educational Analytics to AI Governance: Transferable Lessons from Complex Systems Interventions [0.0]
We argue that five core principles developed within CAPIRE transfer directly to the challenge of governing AI systems.<n>The isomorphism is not merely analogical: both domains exhibit non-linearity, emergence, feedback loops, strategic adaptation, and path dependence.<n>We propose Complex Systems AI Governance (CSAIG) as an integrated framework that operationalises these principles for regulatory design.
arXiv Detail & Related papers (2025-12-15T12:16:57Z) - SelfAI: Building a Self-Training AI System with LLM Agents [79.10991818561907]
SelfAI is a general multi-agent platform that combines a User Agent for translating high-level research objectives into standardized experimental configurations.<n>An Experiment Manager orchestrates parallel, fault-tolerant training across heterogeneous hardware while maintaining a structured knowledge base for continuous feedback.<n>Across regression, computer vision, scientific computing, medical imaging, and drug discovery benchmarks, SelfAI consistently achieves strong performance and reduces redundant trials.
arXiv Detail & Related papers (2025-11-29T09:18:39Z) - Social World Model-Augmented Mechanism Design Policy Learning [58.739456918502704]
We introduce SWM-AP (Social World Model-Augmented Mechanism Design Policy Learning), which learns a social world model hierarchically to enhance mechanism design.<n>We show that SWM-AP outperforms established model-based and model-free RL baselines in cumulative rewards and sample efficiency.
arXiv Detail & Related papers (2025-10-22T06:01:21Z) - Fundamentals of Building Autonomous LLM Agents [64.39018305018904]
This paper reviews the architecture and implementation methods of agents powered by large language models (LLMs)<n>The research aims to explore patterns to develop "agentic" LLMs that can automate complex tasks and bridge the performance gap with human capabilities.
arXiv Detail & Related papers (2025-10-10T10:32:39Z) - Large Language Models in Architecture Studio: A Framework for Learning Outcomes [0.0]
The study explores the role of large language models (LLMs) in the context of the architectural design studio.<n>The main challenges include managing student autonomy, tensions in peer feedback, and the difficulty of balancing the transmission of technical knowledge with the stimulation of creativity in teaching.
arXiv Detail & Related papers (2025-10-08T02:51:22Z) - A Motivational Architecture for Open-Ended Learning Challenges in Robots [42.797352384123386]
We introduce H-GRAIL, a hierarchical architecture that autonomously discovers new goals, learns the required skills for their achievement, generates skill sequences for tackling interdependent tasks, and adapts to non-stationary environments.<n>We tested H-GRAIL in a real robotic scenario, demonstrating how the proposed solutions effectively address the various challenges of open-ended learning.
arXiv Detail & Related papers (2025-06-23T09:46:05Z) - Reward-free World Models for Online Imitation Learning [25.304836126280424]
We propose a novel approach to online imitation learning that leverages reward-free world models.<n>Our method learns environmental dynamics entirely in latent spaces without reconstruction, enabling efficient and accurate modeling.<n>We evaluate our method on a diverse set of benchmarks, including DMControl, MyoSuite, and ManiSkill2, demonstrating superior empirical performance compared to existing approaches.
arXiv Detail & Related papers (2024-10-17T23:13:32Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Sample-Efficient Reinforcement Learning with Symmetry-Guided Demonstrations for Robotic Manipulation [7.099237102357281]
Reinforcement learning (RL) suffers from low sample efficiency, particularly in high-dimensional continuous state-action spaces.<n>We introduce Demo-EASE, a novel training framework using a dual-buffer architecture that stores both demonstrations and RL-generated experiences.<n>Our results show that Demo-EASE significantly accelerates convergence and improves final performance compared to standard RL baselines.
arXiv Detail & Related papers (2023-04-12T11:38:01Z) - Autonomous Open-Ended Learning of Tasks with Non-Stationary
Interdependencies [64.0476282000118]
Intrinsic motivations have proven to generate a task-agnostic signal to properly allocate the training time amongst goals.
While the majority of works in the field of intrinsically motivated open-ended learning focus on scenarios where goals are independent from each other, only few of them studied the autonomous acquisition of interdependent tasks.
In particular, we first deepen the analysis of a previous system, showing the importance of incorporating information about the relationships between tasks at a higher level of the architecture.
Then we introduce H-GRAIL, a new system that extends the previous one by adding a new learning layer to store the autonomously acquired sequences
arXiv Detail & Related papers (2022-05-16T10:43:01Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.