Efficient Planning in Reinforcement Learning via Model Introspection
- URL: http://arxiv.org/abs/2602.07719v1
- Date: Sat, 07 Feb 2026 21:49:21 GMT
- Title: Efficient Planning in Reinforcement Learning via Model Introspection
- Authors: Gabriel Stella,
- Abstract summary: We show that when humans are given a task, regardless of the way it is specified, they can often derive the additional information needed to solve the problem efficiently.<n>By reasoning about their internal models of the problem, humans directly synthesize additional task-relevant information.<n>We then describe an algorithm that enables efficient goal-oriented planning over the class of models used in relational reinforcement learning.
- Score: 2.538209532048867
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reinforcement learning and classical planning are typically seen as two distinct problems, with differing formulations necessitating different solutions. Yet, when humans are given a task, regardless of the way it is specified, they can often derive the additional information needed to solve the problem efficiently. The key to this ability is introspection: by reasoning about their internal models of the problem, humans directly synthesize additional task-relevant information. In this paper, we propose that this introspection can be thought of as program analysis. We discuss examples of how this approach can be applied to various kinds of models used in reinforcement learning. We then describe an algorithm that enables efficient goal-oriented planning over the class of models used in relational reinforcement learning, demonstrating a novel link between reinforcement learning and classical planning.
Related papers
- Iterative Amortized Inference: Unifying In-Context Learning and Learned Optimizers [22.72866404096086]
Amortized learning is the idea of reusing computation or inductive biases shared across tasks to enable rapid generalization to novel problems.<n>Current approaches struggle to scale to large datasets because their capacity to process task data at inference is often limited.<n>We propose iterative amortized inference, a class of models that refine solutions step-by-step over mini-batches.
arXiv Detail & Related papers (2025-10-13T14:40:47Z) - Looking beyond the next token [75.00751370502168]
We argue that rearranging and processing the training data sequences can allow models to more accurately imitate the true data-generating process.<n>Our method naturally enables the generation of long-term goals at no additional cost.
arXiv Detail & Related papers (2025-04-15T16:09:06Z) - Leveraging Hierarchical Taxonomies in Prompt-based Continual Learning [41.13568563835089]
We find that applying human habits of organizing and connecting information can serve as an efficient strategy when training deep learning models.<n>We propose a novel regularization loss function that encourages models to focus more on challenging knowledge areas.
arXiv Detail & Related papers (2024-10-06T01:30:40Z) - Deep Generative Models for Decision-Making and Control [4.238809918521607]
The dual purpose of this thesis is to study the reasons for these shortcomings and to propose solutions for the uncovered problems.
We highlight how inference techniques from the contemporary generative modeling toolbox, including beam search, can be reinterpreted as viable planning strategies for reinforcement learning problems.
arXiv Detail & Related papers (2023-06-15T01:54:30Z) - Hierarchically Structured Task-Agnostic Continual Learning [0.0]
We take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle.
We propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths.
Our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms.
arXiv Detail & Related papers (2022-11-14T19:53:15Z) - Anti-Retroactive Interference for Lifelong Learning [65.50683752919089]
We design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain.
It tackles the problem from two aspects: extracting knowledge and memorizing knowledge.
It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum.
arXiv Detail & Related papers (2022-08-27T09:27:36Z) - Improving Artificial Teachers by Considering How People Learn and Forget [32.74828727144865]
The paper presents a novel model-based method for intelligent tutoring.
Model-based planning picks the best interventions via interactive learning of a user memory model's parameters.
arXiv Detail & Related papers (2021-02-08T13:05:58Z) - Behavior Priors for Efficient Reinforcement Learning [97.81587970962232]
We consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors.
We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives.
We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.
arXiv Detail & Related papers (2020-10-27T13:17:18Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.