Related papers: AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification

AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification

URL: http://arxiv.org/abs/2506.05980v3
Date: Fri, 26 Sep 2025 08:14:01 GMT
Title: AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification
Authors: Geonwoo Cho, Jaemoon Lee, Jaegyun Im, Subi Lee, Jihwan Lee, Sundong Kim,
Abstract summary: Skill-based reinforcement learning (SBRL) enables rapid adaptation in environments with sparse rewards by pretraining a skill-conditioned policy.<n>We propose a new method, Adaptive Multi-objective Projection for balancing Exploration and skill Diversification (AMPED)<n>Our approach achieves performance that surpasses SBRL baselines across various benchmarks.
Score: 4.722248376235009
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Skill-based reinforcement learning (SBRL) enables rapid adaptation in environments with sparse rewards by pretraining a skill-conditioned policy. Effective skill learning requires jointly maximizing both exploration and skill diversity. However, existing methods often face challenges in simultaneously optimizing for these two conflicting objectives. In this work, we propose a new method, Adaptive Multi-objective Projection for balancing Exploration and skill Diversification (AMPED), which explicitly addresses both: during pre-training, a gradient-surgery projection balances the exploration and diversity gradients, and during fine-tuning, a skill selector exploits the learned diversity by choosing skills suited to downstream tasks. Our approach achieves performance that surpasses SBRL baselines across various benchmarks. Through an extensive ablation study, we identify the role of each component and demonstrate that each element in AMPED is contributing to performance. We further provide theoretical and empirical evidence that, with a greedy skill selector, greater skill diversity reduces fine-tuning sample complexity. These results highlight the importance of explicitly harmonizing exploration and diversity and demonstrate the effectiveness of AMPED in enabling robust and generalizable skill learning. Project Page: https://geonwoo.me/amped/

Related papers

Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs [49.72591739116668]
Reinforcement Learning with Verifiable Rewards (RLVR) has become a widely adopted technique for enhancing the reasoning ability of Large Language Models (LLMs)<n>Existing methods address this issue by imitating expert trajectories, which improve effectiveness but neglect diversity.<n>We propose MENTOR: Mixed-policy Expert Navigation for Token-level Optimization of Reasoning.
arXiv Detail & Related papers (2025-10-05T10:38:55Z)
More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration [103.1589018460702]
"guidance-on-demand" approach expands exploration while preserving the value of self-discovery.<n>Experiments show AMPO substantially outperforms a strong baseline.<n>Using four peer-sized teachers, our method achieves comparable results to approaches that leverage a single, more powerful teacher.
arXiv Detail & Related papers (2025-10-02T17:14:00Z)
VendiRL: A Framework for Self-Supervised Reinforcement Learning of Diversely Diverse Skills [0.0]
In self-supervised reinforcement learning (RL), one of the key challenges is learning a diverse set of skills to prepare agents for unknown future tasks.<n>We introduce VendiRL, a unified framework for learning diversely diverse sets of skills.
arXiv Detail & Related papers (2025-09-03T01:53:29Z)
Jointly Reinforcing Diversity and Quality in Language Model Generations [64.72289248044514]
Post-training of Large Language Models (LMs) often prioritizes accuracy and helpfulness at the expense of diversity.<n>We address this challenge with Diversity-Aware Reinforcement Learning (DARLING), a framework that jointly optimize for response quality and semantic diversity.
arXiv Detail & Related papers (2025-09-02T17:38:47Z)
Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning [25.18006424626525]
GO-Skill is a novel approach designed to extract and utilize reusable skills to enhance knowledge transfer and task performance.<n>Our approach uncovers reusable skills through a goal-oriented skill extraction process and leverages vector quantization to construct a discrete skill library.<n>We integrate these skills using hierarchical policy learning, enabling the construction of a high-level policy that dynamically orchestrates discrete skills to accomplish specific tasks.
arXiv Detail & Related papers (2025-07-09T07:54:49Z)
Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts [58.220879689376744]
Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy. We propose textbfDiverse textbfSkill textbfLearning (Di-SkilL) for learning diverse skills. We show on challenging robot simulation tasks that Di-SkilL can learn diverse and performant skills.
arXiv Detail & Related papers (2024-03-11T17:49:18Z)
Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks [53.44714413181162]
This paper shows that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design can be sample-efficient. To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL.
arXiv Detail & Related papers (2024-03-03T22:57:44Z)
ComSD: Balancing Behavioral Quality and Diversity in Unsupervised Skill Discovery [12.277005054008017]
We propose textbfContrastive dynatextbfmic textbfSkill textbfDiscovery textbf(ComSD).<n>ComSD generates diverse and exploratory unsupervised skills through a novel intrinsic incentive, named contrastive dynamic reward.<n>It can also discover distinguishable and far-reaching exploration skills in the challenging tree-like 2D maze.
arXiv Detail & Related papers (2023-09-29T12:53:41Z)
L-SA: Learning Under-Explored Targets in Multi-Target Reinforcement Learning [16.886934253882785]
We propose L-SA (Learning by adaptive Sampling and Active querying) framework that includes adaptive sampling and active querying. In the L-SA framework, adaptive sampling dynamically samples targets with the highest increase of success rates. It is experimentally demonstrated that the cyclic relationship between adaptive sampling and active querying effectively improves the sample richness of under-explored targets.
arXiv Detail & Related papers (2023-05-23T06:51:51Z)
Learning Options via Compression [62.55893046218824]
We propose a new objective that combines the maximum likelihood objective with a penalty on the description length of the skills. Our objective learns skills that solve downstream tasks in fewer samples compared to skills learned from only maximizing likelihood.
arXiv Detail & Related papers (2022-12-08T22:34:59Z)
Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery [12.586875201983778]
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for training neural policies to solve complex control tasks. We show that Quality Diversity (QD) methods are a competitive alternative to information-theory-augmented RL for skill discovery.
arXiv Detail & Related papers (2022-10-06T11:06:39Z)
Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks. Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z)
Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks. In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory. We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z)
Discovering Generalizable Skills via Automated Generation of Diverse Tasks [82.16392072211337]
We propose a method to discover generalizable skills via automated generation of a diverse set of tasks. As opposed to prior work on unsupervised discovery of skills, our method pairs each skill with a unique task produced by a trainable task generator. A task discriminator defined on the robot behaviors in the generated tasks is jointly trained to estimate the evidence lower bound of the diversity objective. The learned skills can then be composed in a hierarchical reinforcement learning algorithm to solve unseen target tasks.
arXiv Detail & Related papers (2021-06-26T03:41:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.