Improving Policy Optimization with Generalist-Specialist Learning
- URL: http://arxiv.org/abs/2206.12984v1
- Date: Sun, 26 Jun 2022 22:06:40 GMT
- Title: Improving Policy Optimization with Generalist-Specialist Learning
- Authors: Zhiwei Jia, Xuanlin Li, Zhan Ling, Shuang Liu, Yiran Wu, Hao Su
- Abstract summary: Generalization in deep reinforcement learning over unseen environment variations usually requires policy learning over a large set of diverse training variations.
We propose a novel generalist-specialist training framework.
Specifically, we first train a generalist on all environment variations; when it fails to improve, we launch a large population of specialists with weights cloned from the generalist.
We show that this framework pushes the envelope of policy learning on several challenging and popular benchmarks including Procgen, Meta-World and ManiSkill.
- Score: 23.480173193633252
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalization in deep reinforcement learning over unseen environment
variations usually requires policy learning over a large set of diverse
training variations. We empirically observe that an agent trained on many
variations (a generalist) tends to learn faster at the beginning, yet its
performance plateaus at a less optimal level for a long time. In contrast, an
agent trained only on a few variations (a specialist) can often achieve high
returns under a limited computational budget. To have the best of both worlds,
we propose a novel generalist-specialist training framework. Specifically, we
first train a generalist on all environment variations; when it fails to
improve, we launch a large population of specialists with weights cloned from
the generalist, each trained to master a selected small subset of variations.
We finally resume the training of the generalist with auxiliary rewards induced
by demonstrations of all specialists. In particular, we investigate the timing
to start specialist training and compare strategies to learn generalists with
assistance from specialists. We show that this framework pushes the envelope of
policy learning on several challenging and popular benchmarks including
Procgen, Meta-World and ManiSkill.
Related papers
- Robust Finetuning of Vision-Language-Action Robot Policies via Parameter Merging [53.41119829581115]
Generalist robot policies, trained on large and diverse datasets, have demonstrated the ability to generalize.<n>They still fall short on new tasks not covered in the training data.<n>We develop a method that preserves the generalization capabilities of the generalist policy during finetuning.
arXiv Detail & Related papers (2025-12-09T08:02:11Z) - Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training [105.74524789405514]
adversarial training (AT) is currently the most effective defense against neural networks.<n>We propose to partition the overall generalization goal into multiple sub-tasks, each assigned to a dedicated base learner.<n>In the later stages of training, we interpolate their parameters to form a knowledgeable global learner.<n>We term this framework Generalist and introduce three variants tailored to different application scenarios.
arXiv Detail & Related papers (2025-10-15T09:47:54Z) - BTS: Harmonizing Specialized Experts into a Generalist LLM [52.026293450944635]
Branch-Train-Stitch (BTS) is an efficient training algorithm for combining independently trained large language model (LLM) experts into a single, capable generalist model.
Compared to alternative model merging approaches, BTS yields the best generalist performance on a variety of downstream tasks.
arXiv Detail & Related papers (2025-01-31T07:54:34Z) - RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning [53.8293458872774]
We propose Reinforcement Learning Distilled Generalists (RLDG) to generate high-quality training data for finetuning generalist policies.
We demonstrate that generalist policies trained with RL-generated data consistently outperform those trained with human demonstrations.
Our results suggest that combining task-specific RL with generalist policy distillation offers a promising approach for developing more capable and efficient robotic manipulation systems.
arXiv Detail & Related papers (2024-12-13T04:57:55Z) - GSL-PCD: Improving Generalist-Specialist Learning with Point Cloud Feature-based Task Partitioning [0.0]
We propose Generalist-Specialist Learning with Point Cloud Feature-based Task Partitioning (GSL-PCD)
Our approach clusters environment variations based on features extracted from object point clouds and uses balanced clustering to assign similar variations to the same specialist.
Evaluations on robotic manipulation tasks from the ManiSkill benchmark demonstrate that point cloud feature-based partitioning outperforms vanilla partitioning by 9.4%, with a fixed number of specialists, and reduces computational and sample requirements by 50% to achieve comparable performance.
arXiv Detail & Related papers (2024-11-11T06:03:42Z) - Specialist or Generalist? Instruction Tuning for Specific NLP Tasks [58.422495509760154]
We investigate whether incorporating broad-coverage generalist instruction tuning can contribute to building a specialist model.
Our experiments assess four target tasks with distinct coverage levels.
The effect is particularly pronounced when the amount of task-specific training data is limited.
arXiv Detail & Related papers (2023-10-23T19:46:48Z) - Improving Generalization of Alignment with Human Preferences through
Group Invariant Learning [56.19242260613749]
Reinforcement Learning from Human Feedback (RLHF) enables the generation of responses more aligned with human preferences.
Previous work shows that Reinforcement Learning (RL) often exploits shortcuts to attain high rewards and overlooks challenging samples.
We propose a novel approach that can learn a consistent policy via RL across various data groups or domains.
arXiv Detail & Related papers (2023-10-18T13:54:15Z) - Guide Your Agent with Adaptive Multimodal Rewards [107.08768813632032]
This work presents Adaptive Return-conditioned Policy (ARP), an efficient framework to enhance the agent's generalization ability.
Our key idea is to calculate a similarity between visual observations and natural language instructions in the pre-trained multimodal embedding space.
Because the multimodal rewards provide adaptive signals at each timestep, our ARP effectively mitigates the goal misgeneralization.
arXiv Detail & Related papers (2023-09-19T17:39:20Z) - Generalist: Decoupling Natural and Robust Generalization [14.244311026737666]
We propose a bi-expert framework called emphGeneralist where we simultaneously train base learners with task-aware strategies.
Generalist achieves high accuracy on natural examples while maintaining considerable robustness to adversarial ones.
arXiv Detail & Related papers (2023-03-24T05:24:23Z) - DART: Diversify-Aggregate-Repeat Training Improves Generalization of
Neural Networks [39.69378006723682]
Generalization of neural networks is crucial for deploying them safely in the real world.
In this work, we first establish a surprisingly simple but strong benchmark for generalization which utilizes diverse augmentations within a training minibatch.
We then propose Diversify-Aggregate-Repeat Training (DART) strategy that first trains diverse models using different augmentations (or domains) to explore the loss basin.
We find that Repeating the step of aggregation throughout training improves the overall optimization trajectory and also ensures that the individual models have a sufficiently low loss barrier to obtain improved generalization on combining them.
arXiv Detail & Related papers (2023-02-28T15:54:47Z) - Learning to Optimize for Reinforcement Learning [58.01132862590378]
Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learneds do not work well even in simple RL tasks.
Agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training.
We show that, although only trained in toy tasks, our learned can generalize unseen complex tasks in Brax.
arXiv Detail & Related papers (2023-02-03T00:11:02Z) - Learning Meta Representations for Agents in Multi-Agent Reinforcement
Learning [12.170248966278281]
In multi-agent reinforcement learning, behaviors that agents learn in a single Markov Game (MG) are typically confined to the given agent number.
In this work, our focus is on creating agents that can generalize across population-varying MGs.
Instead of learning a unimodal policy, each agent learns a policy set comprising effective strategies across a variety of games.
arXiv Detail & Related papers (2021-08-30T04:30:53Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Adversarial Training for Large Neural Language Models [107.84290922621163]
We show that adversarial pre-training can improve both generalization and robustness.
ALUM regularizes the training objective by applying perturbations in the embedding space that maximizes the adversarial loss.
ALUM can be further combined with task-specific fine-tuning to attain additional gains.
arXiv Detail & Related papers (2020-04-20T00:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.