Task-Centric Policy Optimization from Misaligned Motion Priors
- URL: http://arxiv.org/abs/2601.19411v2
- Date: Tue, 03 Feb 2026 09:56:39 GMT
- Title: Task-Centric Policy Optimization from Misaligned Motion Priors
- Authors: Ziang Zheng, Kai Feng, Yi Nie, Shentao Qin,
- Abstract summary: We propose a task-priority adversarial imitation framework that treats imitation as a conditional regularizer rather than a co-equal objective.<n>We provide theoretical analysis of gradient conflict and task-priority stationary points, and validate our claims through humanoid control experiments.
- Score: 5.008550719179743
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Humanoid control often leverages motion priors from human demonstrations to encourage natural behaviors. However, such demonstrations are frequently suboptimal or misaligned with robotic tasks due to embodiment differences, retargeting errors, and task-irrelevant variations, causing naïve imitation to degrade task performance. Conversely, task-only reinforcement learning admits many task-optimal solutions, often resulting in unnatural or unstable motions. This exposes a fundamental limitation of linear reward mixing in adversarial imitation learning. We propose \emph{Task-Centric Motion Priors} (TCMP), a task-priority adversarial imitation framework that treats imitation as a conditional regularizer rather than a co-equal objective. TCMP maximizes task improvement while incorporating imitation signals only when they are compatible with task progress, yielding an adaptive, geometry-aware update that preserves task-feasible descent and suppresses harmful imitation under misalignment. We provide theoretical analysis of gradient conflict and task-priority stationary points, and validate our claims through humanoid control experiments demonstrating robust task performance with consistent motion style under noisy demonstrations.
Related papers
- On the Paradoxical Interference between Instruction-Following and Task Solving [50.75960598434753]
Instruction following aims to align Large Language Models (LLMs) with human intent by specifying explicit constraints on how tasks should be performed.<n>We reveal a counterintuitive phenomenon: instruction following can paradoxically interfere with LLMs' task-solving capability.<n>We propose a metric, SUSTAINSCORE, to quantify the interference of instruction following with task solving.
arXiv Detail & Related papers (2026-01-29T17:48:56Z) - Learning to Move in Rhythm: Task-Conditioned Motion Policies with Orbital Stability Guarantees [45.137864140049814]
We introduce Orbitally Stable Motion Primitives (OSMPs) - a framework that combines a learned diffeomorphic encoder with a supercritical Hopf bifurcation in latent space.<n>We validate the proposed approach through extensive simulation and real-world experiments across a diverse range of robotic platforms.
arXiv Detail & Related papers (2025-07-12T17:10:03Z) - MINGLE: Mixture of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging [29.58798660724693]
Continual model merging integrates independently fine-tuned models sequentially without access to the original training data.<n>We propose MINGLE, a novel framework for Test-Time Continual Model Merging.<n> MINGLE achieves robust generalization, significantly reduces forgetting, and consistently surpasses previous state-of-the-art methods by 7-9% on average.
arXiv Detail & Related papers (2025-05-17T07:24:22Z) - Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models [71.34520793462069]
Unsupervised reinforcement learning (RL) aims at pre-training agents that can solve a wide range of downstream tasks in complex environments.<n>We introduce a novel algorithm regularizing unsupervised RL towards imitating trajectories from unlabeled behavior datasets.<n>We demonstrate the effectiveness of this new approach in a challenging humanoid control problem.
arXiv Detail & Related papers (2025-04-15T10:41:11Z) - Guided Reinforcement Learning for Robust Multi-Contact Loco-Manipulation [12.377289165111028]
Reinforcement learning (RL) often necessitates a meticulous Markov Decision Process (MDP) design tailored to each task.
This work proposes a systematic approach to behavior synthesis and control for multi-contact loco-manipulation tasks.
We define a task-independent MDP to train RL policies using only a single demonstration per task generated from a model-based trajectory.
arXiv Detail & Related papers (2024-10-17T17:46:27Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Exposing and Addressing Cross-Task Inconsistency in Unified
Vision-Language Models [80.23791222509644]
Inconsistent AI models are considered brittle and untrustworthy by human users.
We find that state-of-the-art vision-language models suffer from a surprisingly high degree of inconsistent behavior across tasks.
We propose a rank correlation-based auxiliary training objective, computed over large automatically created cross-task contrast sets.
arXiv Detail & Related papers (2023-03-28T16:57:12Z) - ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks.
Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer.
ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z) - Inferring Versatile Behavior from Demonstrations by Matching Geometric
Descriptors [72.62423312645953]
Humans intuitively solve tasks in versatile ways, varying their behavior in terms of trajectory-based planning and for individual steps.
Current Imitation Learning algorithms often only consider unimodal expert demonstrations and act in a state-action-based setting.
Instead, we combine a mixture of movement primitives with a distribution matching objective to learn versatile behaviors that match the expert's behavior and versatility.
arXiv Detail & Related papers (2022-10-17T16:42:59Z) - Regularized Soft Actor-Critic for Behavior Transfer Learning [10.519534498340482]
Existing imitation learning methods mainly focus on making an agent effectively mimic a demonstrated behavior.
We propose a method called Regularized Soft Actor-Critic which formulates the main task and the imitation task.
We evaluate our method on continuous control tasks relevant to video games applications.
arXiv Detail & Related papers (2022-09-27T07:52:04Z) - Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill
Primitives [89.34229413345541]
We propose a conditioning scheme which avoids pitfalls by learning the controller and its conditioning in an end-to-end manner.
Our model predicts complex action sequences based directly on a dynamic image representation of the robot motion.
We report significant improvements in task success over representative MPC and IL baselines.
arXiv Detail & Related papers (2020-03-19T15:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.