An Idiosyncrasy of Time-discretization in Reinforcement Learning
- URL: http://arxiv.org/abs/2406.14951v2
- Date: Mon, 2 Sep 2024 04:13:50 GMT
- Title: An Idiosyncrasy of Time-discretization in Reinforcement Learning
- Authors: Kris De Asis, Richard S. Sutton,
- Abstract summary: We study how the choice of discretization may affect a reinforcement learning algorithm.
We acknowledge an idiosyncrasy with naively applying a discrete-time algorithm to a discretized continuous-time environment.
- Score: 7.085780872622857
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many reinforcement learning algorithms are built on an assumption that an agent interacts with an environment over fixed-duration, discrete time steps. However, physical systems are continuous in time, requiring a choice of time-discretization granularity when digitally controlling them. Furthermore, such systems do not wait for decisions to be made before advancing the environment state, necessitating the study of how the choice of discretization may affect a reinforcement learning algorithm. In this work, we consider the relationship between the definitions of the continuous-time and discrete-time returns. Specifically, we acknowledge an idiosyncrasy with naively applying a discrete-time algorithm to a discretized continuous-time environment, and note how a simple modification can better align the return definitions. This observation is of practical consideration when dealing with environments where time-discretization granularity is a choice, or situations where such granularity is inherently stochastic.
Related papers
- Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making [66.27188304203217]
Temporal distances lie at the heart of many algorithms for planning, control, and reinforcement learning.
Prior attempts to define such temporal distances in settings have been stymied by an important limitation.
We show how successor features learned by contrastive learning form a temporal distance that does satisfy the triangle inequality.
arXiv Detail & Related papers (2024-06-24T19:36:45Z) - When and How: Learning Identifiable Latent States for Nonstationary Time Series Forecasting [22.915008205203886]
We learn IDentifiable latEnt stAtes (IDEA) to detect when the distribution shifts occur.
We further disentangle the stationary and nonstationary latent states via sufficient observation assumption to learn how the latent states change.
Based on these theories, we devise the IDEA model, which incorporates an autoregressive hidden Markov model to estimate latent environments.
arXiv Detail & Related papers (2024-02-20T07:16:12Z) - Resilient Constrained Learning [94.27081585149836]
This paper presents a constrained learning approach that adapts the requirements while simultaneously solving the learning task.
We call this approach resilient constrained learning after the term used to describe ecological systems that adapt to disruptions by modifying their operation.
arXiv Detail & Related papers (2023-06-04T18:14:18Z) - Continuous-Time Modeling of Counterfactual Outcomes Using Neural
Controlled Differential Equations [84.42837346400151]
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare.
Existing causal inference approaches consider regular, discrete-time intervals between observations and treatment decisions.
We propose a controllable simulation environment based on a model of tumor growth for a range of scenarios.
arXiv Detail & Related papers (2022-06-16T17:15:15Z) - Reconstructing a dynamical system and forecasting time series by
self-consistent deep learning [4.947248396489835]
We introduce a self-consistent deep-learning framework for a noisy deterministic time series.
It provides unsupervised filtering, state-space reconstruction, identification of the underlying differential equations and forecasting.
arXiv Detail & Related papers (2021-08-04T06:10:58Z) - A Temporal Kernel Approach for Deep Learning with Continuous-time
Information [18.204325860752768]
Sequential deep learning models such as RNN, causal CNN and attention mechanism do not readily consume continuous-time information.
Discretizing the temporal data, as we show, causes inconsistency even for simple continuous-time processes.
We provide a principled way to characterize continuous-time systems using deep learning tools.
arXiv Detail & Related papers (2021-03-28T20:13:53Z) - Contrastive learning of strong-mixing continuous-time stochastic
processes [53.82893653745542]
Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data.
We show that a properly constructed contrastive learning task can be used to estimate the transition kernel for small-to-mid-range intervals in the diffusion case.
arXiv Detail & Related papers (2021-03-03T23:06:47Z) - POMDPs in Continuous Time and Discrete Spaces [28.463792234064805]
We consider the problem of optimal decision making in such discrete state and action space systems under partial observability.
We give a mathematical description of a continuous-time partial observable Markov decision process (POMDP)
We present an approach solving the decision problem offline by learning an approximation of the value function and (ii) an online algorithm which provides a solution in belief space using deep reinforcement learning.
arXiv Detail & Related papers (2020-10-02T14:04:32Z) - CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations [72.4716073597902]
We propose a method to learn object Canonical Point Cloud Representations of dynamically or moving objects.
We demonstrate the effectiveness of our method on several applications including shape reconstruction, camera pose estimation, continuoustemporal sequence reconstruction, and correspondence estimation.
arXiv Detail & Related papers (2020-08-06T17:58:48Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.