Reinforcement Learning with Exogenous States and Rewards
- URL: http://arxiv.org/abs/2303.12957v1
- Date: Wed, 22 Mar 2023 23:37:28 GMT
- Title: Reinforcement Learning with Exogenous States and Rewards
- Authors: George Trimponias and Thomas G. Dietterich
- Abstract summary: Exogenous state variables and rewards can slow reinforcement learning by injecting uncontrolled variation into the reward signal.
This paper formalizes endogenous state variables and rewards and shows that if the reward function decomposes additively into endogenous and endogenous components, the MDP can be decomposed into two processes.
- Score: 15.18610763024837
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Exogenous state variables and rewards can slow reinforcement learning by
injecting uncontrolled variation into the reward signal. This paper formalizes
exogenous state variables and rewards and shows that if the reward function
decomposes additively into endogenous and exogenous components, the MDP can be
decomposed into an exogenous Markov Reward Process (based on the exogenous
reward) and an endogenous Markov Decision Process (optimizing the endogenous
reward). Any optimal policy for the endogenous MDP is also an optimal policy
for the original MDP, but because the endogenous reward typically has reduced
variance, the endogenous MDP is easier to solve. We study settings where the
decomposition of the state space into exogenous and endogenous state spaces is
not given but must be discovered. The paper introduces and proves correctness
of algorithms for discovering the exogenous and endogenous subspaces of the
state space when they are mixed through linear combination. These algorithms
can be applied during reinforcement learning to discover the exogenous space,
remove the exogenous reward, and focus reinforcement learning on the endogenous
MDP. Experiments on a variety of challenging synthetic MDPs show that these
methods, applied online, discover large exogenous state spaces and produce
substantial speedups in reinforcement learning.
Related papers
- Sub-DM:Subspace Diffusion Model with Orthogonal Decomposition for MRI Reconstruction [13.418240070456987]
Sub-DM is a subspace diffusion model that restricts the diffusion process via projections onto subspace as the k-space data distribution evolves toward noise.
It circumvents the inference challenges posed by the com-plex and high-dimensional characteristics of k-space data.
It allows the diffusion processes in different spaces to refine models through a mutual feedback mechanism, enabling the learning of ac-curate prior even when dealing with complex k-space data.
arXiv Detail & Related papers (2024-11-06T08:33:07Z) - Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning [44.17068570786194]
We study a class of structured Markov Decision Processes (MDPs) known as Exo-MDPs.
Exo-MDPs provide a natural model for various applications, including inventory control, portfolio management, power systems, and ride-sharing.
arXiv Detail & Related papers (2024-09-22T18:45:38Z) - FP-IRL: Fokker-Planck-based Inverse Reinforcement Learning -- A
Physics-Constrained Approach to Markov Decision Processes [0.5735035463793008]
Inverse Reinforcement Learning (IRL) is a technique for revealing the rationale underlying the behavior of autonomous agents.
IRL seeks to estimate the unknown reward function of a Markov decision process (MDP) from observed agent trajectories.
We create a novel IRL algorithm, FP-IRL, which can simultaneously infer the transition and reward functions using only observed trajectories.
arXiv Detail & Related papers (2023-06-17T18:28:03Z) - Reconstructing Graph Diffusion History from a Single Snapshot [87.20550495678907]
We propose a novel barycenter formulation for reconstructing Diffusion history from A single SnapsHot (DASH)
We prove that estimation error of diffusion parameters is unavoidable due to NP-hardness of diffusion parameter estimation.
We also develop an effective solver named DIffusion hiTting Times with Optimal proposal (DITTO)
arXiv Detail & Related papers (2023-06-01T09:39:32Z) - Optimality Guarantees for Particle Belief Approximation of POMDPs [55.83001584645448]
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems.
POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid.
We propose a theory characterizing the approximation error of the particle filtering techniques that these algorithms use.
arXiv Detail & Related papers (2022-10-10T21:11:55Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Expert-Guided Symmetry Detection in Markov Decision Processes [0.0]
We propose a paradigm that aims to detect the presence of some transformations of the state-action space for which the MDP dynamics is invariant.
The results show that the model distributional shift is reduced when the dataset is augmented with the data obtained by using the detected symmetries.
arXiv Detail & Related papers (2021-11-19T16:12:30Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z) - Targeted free energy estimation via learned mappings [66.20146549150475]
Free energy perturbation (FEP) was proposed by Zwanzig more than six decades ago as a method to estimate free energy differences.
FEP suffers from a severe limitation: the requirement of sufficient overlap between distributions.
One strategy to mitigate this problem, called Targeted Free Energy Perturbation, uses a high-dimensional mapping in configuration space to increase overlap.
arXiv Detail & Related papers (2020-02-12T11:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.