Generative Intrinsic Optimization: Intrinsic Control with Model Learning
- URL: http://arxiv.org/abs/2310.08100v2
- Date: Tue, 14 Nov 2023 08:25:37 GMT
- Title: Generative Intrinsic Optimization: Intrinsic Control with Model Learning
- Authors: Jianfei Ma
- Abstract summary: Future sequence represents the outcome after executing the action into the environment.
Explicit outcomes may vary across state, return, or trajectory serving different purposes such as credit assignment or imitation learning.
We propose a policy scheme that seamlessly incorporates the mutual information, ensuring convergence to the optimal policy.
- Score: 5.439020425819001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Future sequence represents the outcome after executing the action into the
environment (i.e. the trajectory onwards). When driven by the
information-theoretic concept of mutual information, it seeks maximally
informative consequences. Explicit outcomes may vary across state, return, or
trajectory serving different purposes such as credit assignment or imitation
learning. However, the inherent nature of incorporating intrinsic motivation
with reward maximization is often neglected. In this work, we propose a policy
iteration scheme that seamlessly incorporates the mutual information, ensuring
convergence to the optimal policy. Concurrently, a variational approach is
introduced, which jointly learns the necessary quantity for estimating the
mutual information and the dynamics model, providing a general framework for
incorporating different forms of outcomes of interest. While we mainly focus on
theoretical analysis, our approach opens the possibilities of leveraging
intrinsic control with model learning to enhance sample efficiency and
incorporate uncertainty of the environment into decision-making.
Related papers
- KnowPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models [14.057527352653787]
We propose a Knowledge-aware Preference Optimization strategy, dubbed KnowPO, aimed at achieving adaptive knowledge selection.
We show that KnowPO outperforms previous methods for handling knowledge conflicts by over 37%.
arXiv Detail & Related papers (2024-08-06T16:55:54Z) - A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning [48.59516337905877]
Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents.
Recent work has developed theoretical insights into these algorithms.
We take a step towards bridging the gap between theory and practice by analyzing an action-conditional self-predictive objective.
arXiv Detail & Related papers (2024-06-04T07:22:12Z) - Reduced-Rank Multi-objective Policy Learning and Optimization [57.978477569678844]
In practice, causal researchers do not have a single outcome in mind a priori.
In government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty.
We present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning.
arXiv Detail & Related papers (2024-04-29T08:16:30Z) - On Predictive planning and counterfactual learning in active inference [0.20482269513546453]
In this paper, we examine two decision-making schemes in active inference based on 'planning' and 'learning from experience'
We introduce a mixed model that navigates the data-complexity trade-off between these strategies.
We evaluate our proposed model in a challenging grid-world scenario that requires adaptability from the agent.
arXiv Detail & Related papers (2024-03-19T04:02:31Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Performative Reinforcement Learning [8.07595093287034]
We introduce the concept of performatively stable policy.
We show that repeatedly optimizing this objective converges to a performatively stable policy.
arXiv Detail & Related papers (2022-06-30T18:26:03Z) - Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances.
We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z) - Behavior Priors for Efficient Reinforcement Learning [97.81587970962232]
We consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors.
We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives.
We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.
arXiv Detail & Related papers (2020-10-27T13:17:18Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.