Related papers: Inferring Transition Dynamics from Value Functions

Inferring Transition Dynamics from Value Functions

URL: http://arxiv.org/abs/2501.09081v1
Date: Wed, 15 Jan 2025 19:00:47 GMT
Title: Inferring Transition Dynamics from Value Functions
Authors: Jacob Adamczyk,
Abstract summary: In reinforcement learning, the value function is typically trained to solve the Bellman equation.<n>We show that a converged value function encodes a model of the underlying dynamics of the environment.
Score: 1.223779595809275
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In reinforcement learning, the value function is typically trained to solve the Bellman equation, which connects the current value to future values. This temporal dependency hints that the value function may contain implicit information about the environment's transition dynamics. By rearranging the Bellman equation, we show that a converged value function encodes a model of the underlying dynamics of the environment. We build on this insight to propose a simple method for inferring dynamics models directly from the value function, potentially mitigating the need for explicit model learning. Furthermore, we explore the challenges of next-state identifiability, discussing conditions under which the inferred dynamics model is well-defined. Our work provides a theoretical foundation for leveraging value functions in dynamics modeling and opens a new avenue for bridging model-free and model-based reinforcement learning.

Related papers

Neural Networks Remember More: The Power of Parameter Isolation and Combination [3.2430260063115233]
Catastrophic forgetting is a pervasive issue for pre-trained language models. Key to solving this problem is to find a trade-off between the plasticity and stability of the model. We propose a novel method to achieve a balance between model stability and plasticity.
arXiv Detail & Related papers (2025-02-16T02:58:57Z)
ICODE: Modeling Dynamical Systems with Extrinsic Input Information [14.521146920900316]
We introduce emph Input Concomitant Neural ODEs (ICODEs), which incorporate precise real-time input information into the learning process of the models. We validate our method through experiments on several representative real dynamics. This work offers a valuable class of neural ODE models for understanding physical systems with explicit external input information.
arXiv Detail & Related papers (2024-11-21T07:57:59Z)
SOLD: Reinforcement Learning with Slot Object-Centric Latent Dynamics [16.020835290802548]
Slot-Attention for Object-centric Latent Dynamics is a novel algorithm that learns object-centric dynamics models from pixel inputs. We demonstrate that the structured latent space not only improves model interpretability but also provides a valuable input space for behavior models to reason over. Our results show that SOLD outperforms DreamerV3, a state-of-the-art model-based RL algorithm, across a range of benchmark robotic environments.
arXiv Detail & Related papers (2024-10-11T14:03:31Z)
Latent Space Energy-based Neural ODEs [73.01344439786524]
This paper introduces a novel family of deep dynamical models designed to represent continuous-time sequence data. We train the model using maximum likelihood estimation with Markov chain Monte Carlo. Experiments on oscillating systems, videos and real-world state sequences (MuJoCo) illustrate that ODEs with the learnable energy-based prior outperform existing counterparts.
arXiv Detail & Related papers (2024-09-05T18:14:22Z)
Exploring Model Transferability through the Lens of Potential Energy [78.60851825944212]
Transfer learning has become crucial in computer vision tasks due to the vast availability of pre-trained deep learning models. Existing methods for measuring the transferability of pre-trained models rely on statistical correlations between encoded static features and task labels. We present an insightful physics-inspired approach named PED to address these challenges.
arXiv Detail & Related papers (2023-08-29T07:15:57Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
Learning Interacting Dynamical Systems with Latent Gaussian Process ODEs [13.436770170612295]
We study for the first time uncertainty-aware modeling of continuous-time dynamics of interacting objects. Our model infers both independent dynamics and their interactions with reliable uncertainty estimates.
arXiv Detail & Related papers (2022-05-24T08:36:25Z)
Meta-learning using privileged information for dynamics [66.32254395574994]
We extend the Neural ODE Process model to use additional information within the Learning Using Privileged Information setting. We validate our extension with experiments showing improved accuracy and calibration on simulated dynamics tasks.
arXiv Detail & Related papers (2021-04-29T12:18:02Z)
Explore the Context: Optimal Data Collection for Context-Conditional Dynamics Models [7.766117084613689]
We learn dynamics models for parametrized families of dynamical systems with varying properties. We compute an action sequence which, for a limited number of environment interactions, optimally explores the given system. We demonstrate the effectiveness of our method for exploration on a non-linear toy-problem and two well-known reinforcement learning environments.
arXiv Detail & Related papers (2021-02-22T22:52:39Z)
Model-Invariant State Abstractions for Model-Based Reinforcement Learning [54.616645151708994]
We introduce a new type of state abstraction called textitmodel-invariance. This allows for generalization to novel combinations of unseen values of state variables. We prove that an optimal policy can be learned over this model-invariance state abstraction.
arXiv Detail & Related papers (2021-02-19T10:37:54Z)
Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon. We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z)
Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy. We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space. We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning [124.9856253431878]
We decompose the task of learning a global dynamics model into two stages: (a) learning a context latent vector that captures the local dynamics, then (b) predicting the next state conditioned on it. In order to encode dynamics-specific information into the context latent vector, we introduce a novel loss function that encourages the context latent vector to be useful for predicting both forward and backward dynamics. The proposed method achieves superior generalization ability across various simulated robotics and control tasks, compared to existing RL schemes.
arXiv Detail & Related papers (2020-05-14T08:10:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.