Related papers: TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents

TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents

URL: http://arxiv.org/abs/2507.01823v1
Date: Wed, 02 Jul 2025 15:38:49 GMT
Title: TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents
Authors: Dmytro Kuzmenko, Nadiya Shvai,
Abstract summary: We present a novel approach to knowledge transfer in model-based reinforcement learning.<n>Our method efficiently distills a high-capacity multi-task agent into a compact model.<n>Our approach addresses practical deployment limitations and offers insights into knowledge representation in large world models.
Score: 1.6574413179773757
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a novel approach to knowledge transfer in model-based reinforcement learning, addressing the critical challenge of deploying large world models in resource-constrained environments. Our method efficiently distills a high-capacity multi-task agent (317M parameters) into a compact model (1M parameters) on the MT30 benchmark, significantly improving performance across diverse tasks. Our distilled model achieves a state-of-the-art normalized score of 28.45, surpassing the original 1M parameter model score of 18.93. This improvement demonstrates the ability of our distillation technique to capture and consolidate complex multi-task knowledge. We further optimize the distilled model through FP16 post-training quantization, reducing its size by $\sim$50\%. Our approach addresses practical deployment limitations and offers insights into knowledge representation in large world models, paving the way for more efficient and accessible multi-task reinforcement learning systems in robotics and other resource-constrained applications. Code available at https://github.com/dmytro-kuzmenko/td-mpc-opt.

Related papers

Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics [39.854659234873644]
We introduce Mixture-of-World Models (MoW), a scalable architecture that combines modular variational autoencoders for task-adaptive visual compression.<n>On the Atari 100k benchmark, a single MoW agent trained once on 26 Atari games achieves a mean human-normalized score of 110.4%.<n>MoW achieves a 74.5% average success rate within 300 thousand environment steps, establishing a new state of the art.
arXiv Detail & Related papers (2026-02-01T15:00:30Z)
Efficient Multitask Learning in Small Language Models Through Upside-Down Reinforcement Learning [8.995427413172148]
Small language models (SLMs) can achieve competitive performance in multitask prompt generation tasks.<n>We train an SLM that achieves relevance scores within 5% of state-of-the-art models, including Llama-3, Qwen2, and Mistral, despite being up to 80 times smaller.
arXiv Detail & Related papers (2025-02-14T01:39:45Z)
Knowledge Transfer in Model-Based Reinforcement Learning Agents for Efficient Multi-Task Learning [1.6574413179773757]
We propose an efficient knowledge transfer approach for model-based reinforcement learning.<n>We distill a high-capacity multi-task agent into a compact 1M parameter model, achieving state-of-the-art performance on the MT30 benchmark.<n>We apply FP16 post-training quantization, reducing the model size by 50% while maintaining performance.
arXiv Detail & Related papers (2025-01-09T15:55:08Z)
Active Data Curation Effectively Distills Large-Scale Multimodal Models [66.23057263509027]
Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones.<n>In this work we explore an alternative, yet simple approach -- active data curation as effective distillation for contrastive multimodal pretraining.<n>Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations.
arXiv Detail & Related papers (2024-11-27T18:50:15Z)
On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion [23.63688816017186]
Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge. We propose a dynamic logit fusion approach that works with a series of task-specific small models, each specialized in a different task. Our method closes the performance gap by 96.4% in single-task scenarios and by 86.3% in multi-task scenarios.
arXiv Detail & Related papers (2024-06-17T03:07:41Z)
Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal Representation Learning [12.00246872965739]
We propose a novel dynamic self-adaptive multiscale distillation from pre-trained multimodal large model. Our strategy employs a multiscale perspective, enabling the extraction structural knowledge across from the pre-trained multimodal large model. Our methodology streamlines pre-trained multimodal large models using only their output features and original image-level information.
arXiv Detail & Related papers (2024-04-16T18:22:49Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
An Empirical Study of Multimodal Model Merging [148.48412442848795]
Model merging is a technique that fuses multiple models trained on different tasks to generate a multi-task solution. We conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture. We propose two metrics that assess the distance between weights to be merged and can serve as an indicator of the merging outcomes.
arXiv Detail & Related papers (2023-04-28T15:43:21Z)
eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z)
Evaluating model-based planning and planner amortization for continuous control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning. We find that well-tuned model-free agents are strong baselines even for high DoF control problems. We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z)
Scalable and Efficient MoE Training for Multitask Multilingual Models [55.987536562357086]
We develop a system capable of scaling MoE models efficiently to trillions of parameters. We also present new training methods to improve MoE sample efficiency and leverage expert pruning strategy to improve time efficiency. A model trained with 10 billion parameters on 50 languages can achieve state-of-the-art performance in Machine Translation (MT) and multilingual natural language generation tasks.
arXiv Detail & Related papers (2021-09-22T00:57:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.