Self-Improving Skill Learning for Robust Skill-based Meta-Reinforcement Learning
- URL: http://arxiv.org/abs/2502.03752v3
- Date: Thu, 09 Oct 2025 11:16:53 GMT
- Title: Self-Improving Skill Learning for Robust Skill-based Meta-Reinforcement Learning
- Authors: Sanghyeon Lee, Sangjun Bae, Yisak Park, Seungyul Han,
- Abstract summary: Self-Improving Skill Learning (SISL) performs self-guided skill refinement using decoupled high-level and skill improvement policies.<n>SISL consistently outperforms other skill-based meta-RL methods on diverse long-horizon tasks.
- Score: 6.959686714606018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Meta-reinforcement learning (Meta-RL) facilitates rapid adaptation to unseen tasks but faces challenges in long-horizon environments. Skill-based approaches tackle this by decomposing state-action sequences into reusable skills and employing hierarchical decision-making. However, these methods are highly susceptible to noisy offline demonstrations, leading to unstable skill learning and degraded performance. To address this, we propose Self-Improving Skill Learning (SISL), which performs self-guided skill refinement using decoupled high-level and skill improvement policies, while applying skill prioritization via maximum return relabeling to focus updates on task-relevant trajectories, resulting in robust and stable adaptation even under noisy and suboptimal data. By mitigating the effect of noise, SISL achieves reliable skill learning and consistently outperforms other skill-based meta-RL methods on diverse long-horizon tasks.
Related papers
- EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models [57.75717492488268]
Vision-Language-Action (VLA) models have advanced robotic manipulation by leveraging large language models.<n>Supervised Finetuning (SFT) requires hundreds of demonstrations per task, rigidly memorizing trajectories, and failing to adapt when deployment conditions deviate from training.<n>We introduce EVOLVE-VLA, a test-time training framework enabling VLAs to continuously adapt through environment interaction with minimal or zero task-specific demonstrations.
arXiv Detail & Related papers (2025-12-16T18:26:38Z) - Reinforcement Learning via Implicit Imitation Guidance [49.88208134736617]
A natural approach is to incorporate an imitation learning objective, either as regularization during training or to acquire a reference policy.<n>We propose to use prior data solely for guiding exploration via noise added to the policy, sidestepping the need for explicit behavior cloning constraints.<n>Our approach achieves up to 2-3x improvement over prior reinforcement learning from offline methods across seven simulated continuous control tasks.
arXiv Detail & Related papers (2025-06-09T07:32:52Z) - SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations [68.9300049150948]
We address a fundamental challenge in Reinforcement Learning from Interaction Demonstration (RLID)<n>Existing data collection approaches yield sparse, disconnected, and noisy trajectories that fail to capture the full spectrum of possible skill variations and transitions.<n>We present two data augmentation techniques: a Stitched Trajectory Graph (STG) that discovers potential transitions between demonstration skills, and a State Transition Field (STF) that establishes unique connections for arbitrary states within the demonstration neighborhood.
arXiv Detail & Related papers (2025-05-04T13:00:29Z) - SPECI: Skill Prompts based Hierarchical Continual Imitation Learning for Robot Manipulation [3.1997825444285457]
Real-world robot manipulation in dynamic unstructured environments requires lifelong adaptability to evolving objects, scenes and tasks.<n>Traditional imitation learning relies on static training paradigms, which are ill-suited for lifelong adaptation.<n>We propose Skill Prompts-based HiErarchical Continual Imitation Learning (SPECI), a novel end-to-end hierarchical CIL policy architecture for robot manipulation.
arXiv Detail & Related papers (2025-04-22T03:30:38Z) - Dynamic Contrastive Skill Learning with State-Transition Based Skill Clustering and Dynamic Length Adjustment [14.458170645422564]
We propose Dynamic Contrastive Skill Learning (DCSL), a novel framework that redefines skill representation and learning.<n>DCSL introduces three key ideas: state-transition based skill representation, skill similarity function learning, and dynamic skill length adjustment.<n>Our approach enables more flexible and adaptive skill extraction, particularly in complex or noisy datasets, and demonstrates competitive performance compared to existing methods in task completion and efficiency.
arXiv Detail & Related papers (2025-04-21T02:11:39Z) - Robust Offline Imitation Learning Through State-level Trajectory Stitching [37.281554320048755]
Imitation learning (IL) has proven effective for enabling robots to acquire visuomotor skills through expert demonstrations.
Recent advances in offline IL have incorporated suboptimal, unlabeled datasets into the training.
We propose a novel approach to enhance policy learning from mixed-quality offline datasets by leveraging task-relevant trajectory fragments and rich environmental dynamics.
arXiv Detail & Related papers (2025-03-28T15:28:36Z) - Skill-Enhanced Reinforcement Learning Acceleration from Demonstrations [23.15178050525514]
We propose a two-stage method dubbed as Skill-enhanced Reinforcement Learning Acceleration (SeRLA)<n>SeRLA introduces a skill-level adversarial Positive-Unlabeled (PU) learning model to extract useful skill prior knowledge.<n>It then deploys a skill-based soft actor-critic algorithm to leverage this acquired prior knowledge in the downstream online RL stage.
arXiv Detail & Related papers (2024-12-09T04:58:14Z) - Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning [39.991887534269445]
Disentangled Unsupervised Skill Discovery (DUSDi) is a method for learning disentangled skills that can be efficiently reused to solve downstream tasks.
DUSDi decomposes skills into disentangled components, where each skill component only affects one factor of the state space.
DUSDi successfully learns disentangled skills, and significantly outperforms previous skill discovery methods when it comes to applying the learned skills to solve downstream tasks.
arXiv Detail & Related papers (2024-10-15T04:13:20Z) - Skill Learning Using Process Mining for Large Language Model Plan Generation [0.0]
Large language models (LLMs) hold promise for generating plans for complex tasks.
Their effectiveness is limited by sequential execution, lack of control flow models, and difficulties in skill retrieval.
We introduce a novel approach to skill learning in LLMs by integrating process mining techniques.
arXiv Detail & Related papers (2024-10-14T12:48:42Z) - DIDA: Denoised Imitation Learning based on Domain Adaptation [28.36684781402964]
We focus on the problem of Learning from Noisy Demonstrations (LND), where the imitator is required to learn from data with noise.
We propose Denoised Imitation learning based on Domain Adaptation (DIDA), which designs two discriminators to distinguish the noise level and expertise level of data.
Experiment results on MuJoCo demonstrate that DIDA can successfully handle challenging imitation tasks from demonstrations with various types of noise, outperforming most baseline methods.
arXiv Detail & Related papers (2024-04-04T11:29:05Z) - Towards Automated Knowledge Integration From Human-Interpretable Representations [55.2480439325792]
We introduce and motivate theoretically the principles of informed meta-learning enabling automated and controllable inductive bias selection.<n>We empirically demonstrate the potential benefits and limitations of informed meta-learning in improving data efficiency and generalisation.
arXiv Detail & Related papers (2024-02-25T15:08:37Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Algorithm Design for Online Meta-Learning with Task Boundary Detection [63.284263611646]
We propose a novel algorithm for task-agnostic online meta-learning in non-stationary environments.
We first propose two simple but effective detection mechanisms of task switches and distribution shift.
We show that a sublinear task-averaged regret can be achieved for our algorithm under mild conditions.
arXiv Detail & Related papers (2023-02-02T04:02:49Z) - Skill-Based Reinforcement Learning with Intrinsic Reward Matching [77.34726150561087]
We present Intrinsic Reward Matching (IRM), which unifies task-agnostic skill pretraining and task-aware finetuning.
IRM enables us to utilize pretrained skills far more effectively than previous skill selection methods.
arXiv Detail & Related papers (2022-10-14T00:04:49Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise
Datasets [15.206465106699293]
Recent Offline Reinforcement Learning methods have succeeded in learning high-performance policies from fixed datasets of experience.
Our work evaluates this method's ability to scale to vast datasets consisting almost entirely of sub-optimal noise.
This modification enables offline agents to learn state-of-the-art policies in benchmark tasks using datasets where expert actions are outnumbered nearly 65:1.
arXiv Detail & Related papers (2021-10-10T03:55:17Z) - Example-Driven Model-Based Reinforcement Learning for Solving
Long-Horizon Visuomotor Tasks [85.56153200251713]
We introduce EMBR, a model-based RL method for learning primitive skills that are suitable for completing long-horizon visuomotor tasks.
On a Franka Emika robot arm, we find that EMBR enables the robot to complete three long-horizon visuomotor tasks at 85% success rate.
arXiv Detail & Related papers (2021-09-21T16:48:07Z) - Reset-Free Lifelong Learning with Skill-Space Planning [105.00539596788127]
We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL.
LiSP learns the skills in an unsupervised manner using intrinsic rewards and plan over the learned skills using a learned dynamics model.
We demonstrate empirically that LiSP successfully enables long-horizon planning and learns agents that can avoid catastrophic failures even in challenging non-stationary and non-episodic environments.
arXiv Detail & Related papers (2020-12-07T09:33:02Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Learning Not to Learn in the Presence of Noisy Labels [104.7655376309784]
We show that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption.
We show that training with this loss function encourages the model to "abstain" from learning on the data points with noisy labels.
arXiv Detail & Related papers (2020-02-16T09:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.