Related papers: Scaling Tasks, Not Samples: Mastering Humanoid Control through Multi-Task Model-Based Reinforcement Learning

Scaling Tasks, Not Samples: Mastering Humanoid Control through Multi-Task Model-Based Reinforcement Learning

URL: http://arxiv.org/abs/2603.01452v1
Date: Mon, 02 Mar 2026 05:07:43 GMT
Title: Scaling Tasks, Not Samples: Mastering Humanoid Control through Multi-Task Model-Based Reinforcement Learning
Authors: Shaohuai Liu, Weirui Ye, Yilun Du, Le Xie,
Abstract summary: We argue that effective online learning should scale the emphnumber of tasks, rather than the number of samples per task.<n>This regime reveals a structural advantage of model-based reinforcement learning.<n>We instantiate this idea with textbfEfficientZero-Multitask (EZ-M), a sample-efficient multi-task algorithm for online learning.
Score: 49.82882141491629
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Developing generalist robots capable of mastering diverse skills remains a central challenge in embodied AI. While recent progress emphasizes scaling model parameters and offline datasets, such approaches are limited in robotics, where learning requires active interaction. We argue that effective online learning should scale the \emph{number of tasks}, rather than the number of samples per task. This regime reveals a structural advantage of model-based reinforcement learning (MBRL). Because physical dynamics are invariant across tasks, a shared world model can aggregate multi-task experience to learn robust, task-agnostic representations. In contrast, model-free methods suffer from gradient interference when tasks demand conflicting actions in similar states. Task diversity therefore acts as a regularizer for MBRL, improving dynamics learning and sample efficiency. We instantiate this idea with \textbf{EfficientZero-Multitask (EZ-M)}, a sample-efficient multi-task MBRL algorithm for online learning. Evaluated on \textbf{HumanoidBench}, a challenging whole-body control benchmark, EZ-M achieves state-of-the-art performance with significantly higher sample efficiency than strong baselines, without extreme parameter scaling. These results establish task scaling as a critical axis for scalable robotic learning. The project website is available \href{https://yewr.github.io/ez_m/}{here}.

Related papers

One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning [32.13266149565313]
Multi-task world models like UniZero excel in single-task settings.<n>We find that gradient conflicts and the loss of model plasticity often constrain their sample efficiency.<n>In this work, we address these challenges from two complementary perspectives: the single learning iteration and the overall learning process.
arXiv Detail & Related papers (2025-09-09T17:27:53Z)
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners [60.75160178669076]
We show that the use of high-capacity value models trained via cross-entropy and conditioned on learnable task embeddings addresses the problem of task interference in online reinforcement learning.<n>We test our approach on 7 multi-task benchmarks with over 280 unique tasks, spanning high degree-of-freedom humanoid control and discrete vision-based RL.
arXiv Detail & Related papers (2025-05-29T06:41:45Z)
Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel value-based reinforcement learning algorithm that learns a critic network that outputs Q-values over a sequence of actions.<n>Experiments show that CQN-AS outperforms several baselines on a variety of sparse-reward humanoid control and tabletop manipulation tasks.
arXiv Detail & Related papers (2024-11-19T01:23:52Z)
LIMT: Language-Informed Multi-Task Visual World Models [6.128332310539627]
Multi-task reinforcement learning can be very challenging due to the increased sample complexity and the potentially conflicting task objectives. We propose a method for learning multi-task visual world models, leveraging pre-trained language models to extract semantically meaningful task representations. Our results highlight the benefits of using language-driven task representations for world models and a clear advantage of model-based multi-task learning over the more common model-free paradigm.
arXiv Detail & Related papers (2024-07-18T12:40:58Z)
Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
Keypoint-based Affordance Guidance for Improvements (KAGI) is a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL.<n>On diverse real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 30K online fine-tuning steps.
arXiv Detail & Related papers (2024-07-14T21:41:29Z)
GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot [27.410618312830497]
In this paper, we propose GeRM (Generalist Robotic Model) We utilize offline reinforcement learning to optimize data utilization strategies. We employ a transformer-based VLA network to process multi-modal inputs and output actions.
arXiv Detail & Related papers (2024-03-20T07:36:43Z)
Exploring intra-task relations to improve meta-learning algorithms [1.223779595809275]
We aim to exploit external knowledge of task relations to improve training stability via effective mini-batching of tasks. We hypothesize that selecting a diverse set of tasks in a mini-batch will lead to a better estimate of the full gradient and hence will lead to a reduction of noise in training.
arXiv Detail & Related papers (2023-12-27T15:33:52Z)
Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z)
Enhancing Robotic Manipulation: Harnessing the Power of Multi-Task Reinforcement Learning and Single Life Reinforcement Learning in Meta-World [0.0]
This research project is to enable a robotic arm to execute seven distinct tasks within the Meta World environment. A trained model will serve as a source of prior data for the single-life RL algorithm. An ablation study demonstrates that MT-QWALE successfully completes tasks with a slightly larger number of steps even after hiding the final goal position.
arXiv Detail & Related papers (2023-10-23T06:35:44Z)
Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning [61.3506230781327]
In robotics, one approach to generate training data builds on simulations based on dynamics models derived from first principles. Here, we leverage the imbalance in complexity of the dynamics to learn more sample-efficiently. We validate our method on several challenging simulated tasks and demonstrate that it improves learning both alone and when combined with an existing hindsight algorithm.
arXiv Detail & Related papers (2023-03-03T21:55:04Z)
An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems [4.675744559395732]
Multitask learning assumes that models capable of learning from multiple tasks can achieve better quality and efficiency via knowledge transfer. State of the art ML models rely on high customization for each task and leverage size and data scale rather than scaling the number of tasks. We propose an evolutionary method that can generate a large scale multitask model and can support the dynamic and continuous addition of new tasks.
arXiv Detail & Related papers (2022-05-25T13:10:47Z)
Controllable Dynamic Multi-Task Architectures [92.74372912009127]
We propose a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints. We propose a disentangled training of two hypernetworks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree-structured models with adapted weights.
arXiv Detail & Related papers (2022-03-28T17:56:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.