Related papers: Reinforcement Learning with Exogenous States and Rewards

Reinforcement Learning with Exogenous States and Rewards

URL: http://arxiv.org/abs/2303.12957v1
Date: Wed, 22 Mar 2023 23:37:28 GMT
Title: Reinforcement Learning with Exogenous States and Rewards
Authors: George Trimponias and Thomas G. Dietterich
Abstract summary: Exogenous state variables and rewards can slow reinforcement learning by injecting uncontrolled variation into the reward signal. This paper formalizes endogenous state variables and rewards and shows that if the reward function decomposes additively into endogenous and endogenous components, the MDP can be decomposed into two processes.
Score: 15.18610763024837
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Exogenous state variables and rewards can slow reinforcement learning by injecting uncontrolled variation into the reward signal. This paper formalizes exogenous state variables and rewards and shows that if the reward function decomposes additively into endogenous and exogenous components, the MDP can be decomposed into an exogenous Markov Reward Process (based on the exogenous reward) and an endogenous Markov Decision Process (optimizing the endogenous reward). Any optimal policy for the endogenous MDP is also an optimal policy for the original MDP, but because the endogenous reward typically has reduced variance, the endogenous MDP is easier to solve. We study settings where the decomposition of the state space into exogenous and endogenous state spaces is not given but must be discovered. The paper introduces and proves correctness of algorithms for discovering the exogenous and endogenous subspaces of the state space when they are mixed through linear combination. These algorithms can be applied during reinforcement learning to discover the exogenous space, remove the exogenous reward, and focus reinforcement learning on the endogenous MDP. Experiments on a variety of challenging synthetic MDPs show that these methods, applied online, discover large exogenous state spaces and produce substantial speedups in reinforcement learning.

Related papers

LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning [16.093659272414527]
We introduce Language Models for Out-of-Distribution Recovery (LaMOuR), which enables recovery learning without relying on uncertainty estimation. LaMOuR generates dense reward codes that guide the agent back to a state where it can successfully perform its original task. Experimental results show that LaMOuR substantially enhances recovery efficiency across diverse locomotion tasks.
arXiv Detail & Related papers (2025-03-21T13:20:39Z)
Sub-DM:Subspace Diffusion Model with Orthogonal Decomposition for MRI Reconstruction [13.418240070456987]
Sub-DM is a subspace diffusion model that restricts the diffusion process via projections onto subspace as the k-space data distribution evolves toward noise. It circumvents the inference challenges posed by the com-plex and high-dimensional characteristics of k-space data. It allows the diffusion processes in different spaces to refine models through a mutual feedback mechanism, enabling the learning of ac-curate prior even when dealing with complex k-space data.
arXiv Detail & Related papers (2024-11-06T08:33:07Z)
Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning [44.17068570786194]
We study a class of structured Markov Decision Processes (MDPs) known as Exo-MDPs. Exo-MDPs provide a natural model for various applications, including inventory control, portfolio management, power systems, and ride-sharing.
arXiv Detail & Related papers (2024-09-22T18:45:38Z)
FP-IRL: Fokker-Planck-based Inverse Reinforcement Learning -- A Physics-Constrained Approach to Markov Decision Processes [0.5735035463793008]
Inverse Reinforcement Learning (IRL) is a technique for revealing the rationale underlying the behavior of autonomous agents. IRL seeks to estimate the unknown reward function of a Markov decision process (MDP) from observed agent trajectories. We create a novel IRL algorithm, FP-IRL, which can simultaneously infer the transition and reward functions using only observed trajectories.
arXiv Detail & Related papers (2023-06-17T18:28:03Z)
Reconstructing Graph Diffusion History from a Single Snapshot [87.20550495678907]
We propose a novel barycenter formulation for reconstructing Diffusion history from A single SnapsHot (DASH) We prove that estimation error of diffusion parameters is unavoidable due to NP-hardness of diffusion parameter estimation. We also develop an effective solver named DIffusion hiTting Times with Optimal proposal (DITTO)
arXiv Detail & Related papers (2023-06-01T09:39:32Z)
Optimality Guarantees for Particle Belief Approximation of POMDPs [55.83001584645448]
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid. We propose a theory characterizing the approximation error of the particle filtering techniques that these algorithms use.
arXiv Detail & Related papers (2022-10-10T21:11:55Z)
Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE) Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor. We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z)
Expert-Guided Symmetry Detection in Markov Decision Processes [0.0]
We propose a paradigm that aims to detect the presence of some transformations of the state-action space for which the MDP dynamics is invariant. The results show that the model distributional shift is reduced when the dataset is augmented with the data obtained by using the detected symmetries.
arXiv Detail & Related papers (2021-11-19T16:12:30Z)
Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera. Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations. However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z)
Targeted free energy estimation via learned mappings [66.20146549150475]
Free energy perturbation (FEP) was proposed by Zwanzig more than six decades ago as a method to estimate free energy differences. FEP suffers from a severe limitation: the requirement of sufficient overlap between distributions. One strategy to mitigate this problem, called Targeted Free Energy Perturbation, uses a high-dimensional mapping in configuration space to increase overlap.
arXiv Detail & Related papers (2020-02-12T11:10:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.