Procedural Generalization by Planning with Self-Supervised World Models
- URL: http://arxiv.org/abs/2111.01587v1
- Date: Tue, 2 Nov 2021 13:32:21 GMT
- Title: Procedural Generalization by Planning with Self-Supervised World Models
- Authors: Ankesh Anand, Jacob Walker, Yazhe Li, Eszter V\'ertes, Julian
Schrittwieser, Sherjil Ozair, Th\'eophane Weber, Jessica B. Hamrick
- Abstract summary: We measure the generalization ability of model-based agents in comparison to their model-free counterparts.
We identify three factors of procedural generalization -- planning, self-supervised representation learning, and procedural data diversity.
We find that these factors do not always provide the same benefits for the task generalization.
- Score: 10.119257232716834
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the key promises of model-based reinforcement learning is the ability
to generalize using an internal model of the world to make predictions in novel
environments and tasks. However, the generalization ability of model-based
agents is not well understood because existing work has focused on model-free
agents when benchmarking generalization. Here, we explicitly measure the
generalization ability of model-based agents in comparison to their model-free
counterparts. We focus our analysis on MuZero (Schrittwieser et al., 2020), a
powerful model-based agent, and evaluate its performance on both procedural and
task generalization. We identify three factors of procedural generalization --
planning, self-supervised representation learning, and procedural data
diversity -- and show that by combining these techniques, we achieve
state-of-the art generalization performance and data efficiency on Procgen
(Cobbe et al., 2019). However, we find that these factors do not always provide
the same benefits for the task generalization benchmarks in Meta-World (Yu et
al., 2019), indicating that transfer remains a challenge and may require
different approaches than procedural generalization. Overall, we suggest that
building generalizable agents requires moving beyond the single-task,
model-free paradigm and towards self-supervised model-based agents that are
trained in rich, procedural, multi-task environments.
Related papers
- Capabilities Ain't All You Need: Measuring Propensities in AI [32.960519634809145]
We introduce the first formal framework for measuring AI propensities by using a bilogistic formulation for model success.<n>We find that we can measure how much the propensity is shifted and what effect this has on the tasks.<n>We obtain stronger predictive power when combining propensities and capabilities than either separately.
arXiv Detail & Related papers (2026-02-20T12:40:18Z) - The Hierarchy of Agentic Capabilities: Evaluating Frontier Models on Realistic RL Environments [0.11586753333439907]
We present an empirical study evaluating frontier AI models on 150 workplace tasks within a realistic e-commerce RL environment from Surge.<n>Our analysis reveals an empirically-derived emphhierarchy of agentic capabilities that models must master for real-world deployment.<n>Weaker models struggle with fundamental tool use and planning, whereas stronger models primarily fail on tasks requiring contextual inference beyond explicit instructions.
arXiv Detail & Related papers (2026-01-13T23:49:06Z) - Robust Finetuning of Vision-Language-Action Robot Policies via Parameter Merging [53.41119829581115]
Generalist robot policies, trained on large and diverse datasets, have demonstrated the ability to generalize.<n>They still fall short on new tasks not covered in the training data.<n>We develop a method that preserves the generalization capabilities of the generalist policy during finetuning.
arXiv Detail & Related papers (2025-12-09T08:02:11Z) - What Do LLM Agents Do When Left Alone? Evidence of Spontaneous Meta-Cognitive Patterns [27.126691338850254]
We introduce an architecture for studying the behavior of large language model (LLM) agents in the absence of externally imposed tasks.<n>Our continuous reason and act framework, using persistent memory and self-feedback, enables sustained autonomous operation.
arXiv Detail & Related papers (2025-09-25T14:29:49Z) - Generalizability of Large Language Model-Based Agents: A Comprehensive Survey [32.40919143404769]
Large Language Model (LLM)-based agents are increasingly deployed in diverse domains like web navigation and household robotics.<n>Despite growing interest, the concept of generalizability in LLM-based agents remains underdefined.<n>This survey aims to establish a foundation for principled research on building LLM-based agents that generalize reliably across diverse applications.
arXiv Detail & Related papers (2025-09-19T18:13:32Z) - OMGPT: A Sequence Modeling Framework for Data-driven Operational Decision Making [5.419799294989289]
We build a Generative Pre-trained Transformer (GPT) model to solve sequential decision making tasks.<n>We first propose a general sequence modeling framework to cover several operational decision making tasks.<n>We then train a transformer-based neural network model (OMGPT) as a natural and powerful architecture for sequential modeling.
arXiv Detail & Related papers (2025-05-19T15:33:03Z) - PEER pressure: Model-to-Model Regularization for Single Source Domain Generalization [12.15086255236961]
We show that the performance of such augmentation-based methods in the target domains universally fluctuates during training.<n>We propose a novel generalization method, coined.<n>Space Ensemble with Entropy Regularization (PEER), that uses a proxy model to learn the augmented data.
arXiv Detail & Related papers (2025-05-19T06:01:11Z) - The Science of Evaluating Foundation Models [46.973855710909746]
This work focuses on three key aspects: (1) Formalizing the Evaluation Process by providing a structured framework tailored to specific use-case contexts; (2) Offering Actionable Tools and Frameworks such as checklists and templates to ensure thorough, reproducible, and practical evaluations; and (3) Surveying Recent Work with a targeted review of advancements in LLM evaluation, emphasizing real-world applications.
arXiv Detail & Related papers (2025-02-12T22:55:43Z) - On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks.
We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly.
In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z) - Toward Universal and Interpretable World Models for Open-ended Learning Agents [0.0]
We introduce a generic, compositional and interpretable class of generative world models that supports open-ended learning agents.
This is a sparse class of Bayesian networks capable of approximating a broad range of processes, which provide agents with the ability to learn world models in a manner that may be both interpretable and computationally scalable.
arXiv Detail & Related papers (2024-09-27T12:03:15Z) - Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks [50.75902473813379]
This work introduces a comprehensive evaluation framework that systematically examines the role of instructions and inputs in the generalisation abilities of such models.
The proposed framework uncovers the resilience of multimodal models to extreme instruction perturbations and their vulnerability to observational changes.
arXiv Detail & Related papers (2024-07-04T14:36:49Z) - Building Socially-Equitable Public Models [32.35090986784889]
Public models offer predictions to a variety of downstream tasks and have played a crucial role in various AI applications.
We advocate for integrating the objectives of downstream agents into the optimization process.
We propose a novel Equitable Objective to address performance disparities and foster fairness among heterogeneous agents in training.
arXiv Detail & Related papers (2024-06-04T21:27:43Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional
MoEs [63.936622239286685]
We find that interference among different tasks and modalities is the main factor to this phenomenon.
We introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models.
Code and pre-trained generalist models shall be released.
arXiv Detail & Related papers (2022-06-09T17:59:59Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z) - Leveraging Approximate Symbolic Models for Reinforcement Learning via
Skill Diversity [32.35693772984721]
We introduce Symbolic-Model Guided Reinforcement Learning, wherein we will formalize the relationship between the symbolic model and the underlying MDP.
We will use these models to extract high-level landmarks that will be used to decompose the task.
At the low level, we learn a set of diverse policies for each possible task sub-goal identified by the landmark.
arXiv Detail & Related papers (2022-02-06T23:20:30Z) - A Self-Supervised Framework for Function Learning and Extrapolation [1.9374999427973014]
We present a framework for how a learner may acquire representations that support generalization.
We show the resulting representations outperform those from other models for unsupervised time series learning.
arXiv Detail & Related papers (2021-06-14T12:41:03Z) - Robustness to Augmentations as a Generalization metric [0.0]
Generalization is the ability of a model to predict on unseen domains.
We propose a method to predict the generalization performance of a model by using the concept that models that are robust to augmentations are more generalizable than those which are not.
The proposed method was the first runner up solution for the NeurIPS competition on Predicting Generalization in Deep Learning.
arXiv Detail & Related papers (2021-01-16T15:36:38Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.