Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning
- URL: http://arxiv.org/abs/2310.00229v4
- Date: Sat, 16 Mar 2024 18:59:23 GMT
- Title: Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning
- Authors: Mingde Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio,
- Abstract summary: Skipper is a model-based reinforcement learning framework.
It automatically generalizes the task given into smaller, more manageable subtasks.
It enables sparse decision-making and focused abstractions on the relevant parts of the environment.
- Score: 83.41487567765871
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstractions to generalize better in novel situations. It automatically decomposes the given task into smaller, more manageable subtasks, and thus enables sparse decision-making and focused computation on the relevant parts of the environment. The decomposition relies on the extraction of an abstracted proxy problem represented as a directed graph, in which vertices and edges are learned end-to-end from hindsight. Our theoretical analyses provide performance guarantees under appropriate assumptions and establish where our approach is expected to be helpful. Generalization-focused experiments validate Skipper's significant advantage in zero-shot generalization, compared to some existing state-of-the-art hierarchical planning methods.
Related papers
- Symphony of experts: orchestration with adversarial insights in
reinforcement learning [0.0]
We explore the concept of orchestration, where a set of expert policies guides decision-making.
We extend the analysis of natural policy gradient to arbitrary adversarial aggregation strategies.
A key point of our approach lies in its arguably more transparent proofs compared to existing methods.
arXiv Detail & Related papers (2023-10-25T08:53:51Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - An Option-Dependent Analysis of Regret Minimization Algorithms in
Finite-Horizon Semi-Markov Decision Processes [47.037877670620524]
We present an option-dependent upper bound to the regret suffered by regret minimization algorithms in finite-horizon problems.
We illustrate that the performance improvement derives from the planning horizon reduction induced by the temporal abstraction enforced by the hierarchical structure.
arXiv Detail & Related papers (2023-05-10T15:00:05Z) - Hierarchical State Abstraction Based on Structural Information
Principles [70.24495170921075]
We propose a novel mathematical Structural Information principles-based State Abstraction framework, namely SISA, from the information-theoretic perspective.
SISA is a general framework that can be flexibly integrated with different representation-learning objectives to improve their performances further.
arXiv Detail & Related papers (2023-04-24T11:06:52Z) - Disambiguation of weak supervision with exponential convergence rates [88.99819200562784]
In supervised learning, data are annotated with incomplete yet discriminative information.
In this paper, we focus on partial labelling, an instance of weak supervision where, from a given input, we are given a set of potential targets.
We propose an empirical disambiguation algorithm to recover full supervision from weak supervision.
arXiv Detail & Related papers (2021-02-04T18:14:32Z) - Generalized Inverse Planning: Learning Lifted non-Markovian Utility for
Generalizable Task Representation [83.55414555337154]
In this work, we study learning such utility from human demonstrations.
We propose a new quest, Generalized Inverse Planning, for utility learning in this domain.
We outline a computational framework, Maximum Entropy Inverse Planning (MEIP), that learns non-Markovian utility and associated concepts in a generative manner.
arXiv Detail & Related papers (2020-11-12T21:06:26Z) - Randomized Value Functions via Posterior State-Abstraction Sampling [21.931580762349096]
We argue that an agent seeking out latent task structure must explicitly represent and maintain its uncertainty over that structure.
We introduce a practical algorithm for doing this using two posterior distributions over state abstractions and abstract-state values.
In empirically validating our approach, we find that substantial performance gains lie in the multi-task setting.
arXiv Detail & Related papers (2020-10-05T23:04:18Z) - Information-Theoretic Abstractions for Planning in Agents with
Computational Constraints [16.565205172451662]
We show how a path-planning problem in an environment can be systematically approximated by solving a sequence of easier to solve problems on abstractions of the original space.
A numerical example is presented to show the utility of the approach and to corroborate the theoretical findings.
arXiv Detail & Related papers (2020-05-19T17:32:10Z) - Target-Embedding Autoencoders for Supervised Representation Learning [111.07204912245841]
This paper analyzes a framework for improving generalization in a purely supervised setting, where the target space is high-dimensional.
We motivate and formalize the general framework of target-embedding autoencoders (TEA) for supervised prediction, learning intermediate latent representations jointly optimized to be both predictable from features as well as predictive of targets.
arXiv Detail & Related papers (2020-01-23T02:37:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.