Summary
This week's reinforcement learning theme centers on making agents learn richer behaviors through curriculum design and modular skill representations. Representative papers emphasize two linked needs: RL agents should acquire multiple viable behaviors for the same task, and curriculum mechanisms should be portable and scalable enough to support large, open-ended task spaces.
Situation
Representative introductions frame a common problem: standard RL policies often converge to a single behavior even when diverse solutions would improve robustness, adaptability, and recovery under changing conditions. One line of work addresses this by combining mixture-of-experts policies with automatic curriculum learning, so different experts specialize over preferred regions of a continuous context space and together provide multiple task-solving modes without requiring prior knowledge of environment bounds.
A second line of work argues that curriculum learning is already central to many strong RL results, especially in environments with large, evolving, or open-ended task spaces, but remains difficult to reuse because curriculum logic is tightly coupled to training infrastructure. This motivates more modular curriculum systems that separate task sequencing from policy optimization, aiming to improve reproducibility and make automatic curricula usable across diverse RL libraries and harder domains. Supplemental evidence also points toward continually expanding skill repertoires, reinforcing the broader shift from single-policy training toward structured skill growth.
Infographic (English)

Progress
SkillOS: Learning Skill Curation for Self-Evolving Agents <See Details on Fugu-MT>
SkillOS introduces a learned skill curator that updates an external skill repository from accumulated agent experience, enabling self-evolving behavior. Rather than relying on fixed multi-skill structures or static curricula, it adds explicit learned curation of a growing skill set and reports gains over memory-free and memory-based baselines.
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning <See Details on Fugu-MT>
Skill1 trains a single policy to jointly optimize skill selection, utilization, and distillation toward shared task outcomes. This unified optimization replaces prior approaches that handled skill-library operations with separate or conflicting reward signals, reducing partial or inconsistent skill evolution.
Skill Neologisms: Towards Skill-based Continual Learning <See Details on Fugu-MT>
Skill Neologisms demonstrates that learned soft-token skill representations can compose with out-of-distribution skills without weight updates. Compared with curricula over existing behaviors, this points toward scalable continual learning by creating and reusing new compositional skill tokens independently.
Outlook
Outlook Summary
Near-term work will make diverse-skill RL more usable in hard control by adding replanning, recovery after perturbations, and more sample-efficient off-policy learning. Curriculum research will also move from generic task ordering toward meaningful task axes in long-horizon and multi-agent domains. These threads are likely to meet in growing skill libraries, where systems expand, organize, reuse, and curate skills while managing stability, drift, and normalization problems.
Infographic (English)

Three-Year Movement
Over three years, this path moves from research benchmarks toward operational infrastructure for adaptive RL. In the first year, teams mainly stress-test mixture-of-experts curricula, portable curriculum tools, and self-evolving training loops in harder control, long-horizon, and multi-agent settings. They add measures for diversity, robustness, perturbation recovery, and context coverage, not only average reward. By the second year, common modules start to emerge: curriculum generators choose tasks, context samplers vary situations, expert routers assign policies, skill repositories store behaviors, and drift monitors check instability. By the third year, labs and platform teams may manage curricula and skill libraries like normal machine-learning assets, with versions, tests, comparisons, and rollback when performance regresses. The main problems become redundant skills, stale skills, forgetting, negative transfer, and changing multi-agent conditions. Use is strongest in simulation-heavy robotics, open-ended game worlds, tool-use sandboxes, and complex decision-training environments, rather than full real-world autonomy.
Over three years, this scenario keeps the same technical direction but makes cost the main constraint. In the first year, researchers still build on mixture-of-experts curricula, portable curriculum systems, and self-evolving training loops. They also find that these methods multiply training and maintenance costs, because each expert may need its own curriculum and each software adapter must be kept working. The practical question shifts from growing the largest possible skill library to getting useful diversity under a fixed GPU budget. By the three-year point, the movement is likely to favor cost-aware curriculum design, simpler repeatable pipelines, and careful comparisons against cheaper baselines. Applied teams in robotics simulation, game AI, and agent benchmarks may prefer single-policy distillation, sequential curricula, and checkpointed skills when those are easier to rerun and audit. Skill libraries still matter, but they are judged by efficiency, maintainability, and reproducibility, not just by the number of behaviors they contain.
Over three years, this scenario turns diverse-skill RL into a provenance and testing problem as much as a training problem. The first-year setup is a move toward agents that learn more than one brittle behavior, with better perturbation recovery, off-policy sample efficiency, and curriculum tools that transfer across codebases. That creates a need to know why a skill appeared, improved, failed, or collapsed. Teams therefore begin recording curriculum choices, task contexts, expert assignments, replay data, and human anchors alongside checkpoints. By the three-year point, the likely movement is toward regression testing for skill libraries, where agents are checked for diversity and coverage as well as headline reward. A checkpoint would not just say how well the agent scored; it would also carry metadata about the training path that produced its skills. This makes long-horizon agents easier to compare, debug, and maintain when skills drift, disappear, or become too narrow.
1-Year / 3-Year Research-Application Infographic

References
- Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts - Authors: Onur Celik, Aleksandar Taranovic, Gerhard Neumann / <See Details on Fugu-MT> / License: CC-BY-4.0
- Syllabus: Portable Curricula for Reinforcement Learning Agents - Authors: Ryan Sullivan, Ryan Pégoud, Ameen Ur Rahmen, Xinchen Yang, Junyun Huang, Aayush Verma, Nistha Mitra, John P. Dickerson, / <See Details on Fugu-MT> / License: CC-BY-4.0
- Guided Self-Evolving LLMs with Minimal Human Supervision - Authors: Wenhao Yu, Zhenwen Liang, Chengsong Huang, Kishan Panaganti, Tianqing Fang, Haitao Mi, Dong Yu, / <See Details on Fugu-MT> / License: CC-BY-4.0