OCMDP: Observation-Constrained Markov Decision Process
- URL: http://arxiv.org/abs/2411.07087v2
- Date: Tue, 12 Nov 2024 12:03:07 GMT
- Title: OCMDP: Observation-Constrained Markov Decision Process
- Authors: Taiyi Wang, Jianheng Liu, Bryan Lee, Zhihao Wu, Yu Wu,
- Abstract summary: We tackle the challenge of simultaneously learning observation and control strategies in cost-sensitive environments.
We develop an iterative, model-free deep reinforcement learning algorithm that separates the sensing and control components of the policy.
We validate our approach on a simulated diagnostic task and a realistic healthcare environment using HeartPole.
- Score: 9.13947446878397
- License:
- Abstract: In many practical applications, decision-making processes must balance the costs of acquiring information with the benefits it provides. Traditional control systems often assume full observability, an unrealistic assumption when observations are expensive. We tackle the challenge of simultaneously learning observation and control strategies in such cost-sensitive environments by introducing the Observation-Constrained Markov Decision Process (OCMDP), where the policy influences the observability of the true state. To manage the complexity arising from the combined observation and control actions, we develop an iterative, model-free deep reinforcement learning algorithm that separates the sensing and control components of the policy. This decomposition enables efficient learning in the expanded action space by focusing on when and what to observe, as well as determining optimal control actions, without requiring knowledge of the environment's dynamics. We validate our approach on a simulated diagnostic task and a realistic healthcare environment using HeartPole. Given both scenarios, the experimental results demonstrate that our model achieves a substantial reduction in observation costs on average, significantly outperforming baseline methods by a notable margin in efficiency.
Related papers
- Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation [24.984938229619075]
Reinforcement Learning has revolutionized decision-making processes in dynamic environments.
The lack of precise environmental information makes it challenging to provide clear feedback signals.
We develop a self-feedback mechanism for autonomous goal detection and cessation upon task completion.
arXiv Detail & Related papers (2024-09-14T21:42:17Z) - Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency.
In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution.
Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z) - Explaining by Imitating: Understanding Decisions by Interpretable Policy
Learning [72.80902932543474]
Understanding human behavior from observed data is critical for transparency and accountability in decision-making.
Consider real-world settings such as healthcare, in which modeling a decision-maker's policy is challenging.
We propose a data-driven representation of decision-making behavior that inheres transparency by design, accommodates partial observability, and operates completely offline.
arXiv Detail & Related papers (2023-10-28T13:06:14Z) - Conditional Kernel Imitation Learning for Continuous State Environments [9.750698192309978]
We introduce a novel conditional kernel density estimation-based imitation learning framework.
We show consistently superior empirical performance over many state-of-the-art IL algorithms.
arXiv Detail & Related papers (2023-08-24T05:26:42Z) - Online Modeling and Monitoring of Dependent Processes under Resource
Constraints [11.813520177037763]
The proposed method designs a collaborative learning-based upper confidence bound (CL-UCB) algorithm to optimally balance the exploitation and exploration of dependent processes under limited resources.
efficiency of the proposed method is demonstrated through theoretical analysis, simulation studies and an empirical study of adaptive cognitive monitoring in Alzheimer's disease.
arXiv Detail & Related papers (2023-07-26T14:14:38Z) - Worst-Case Control and Learning Using Partial Observations Over an
Infinite Time-Horizon [2.456909016197174]
Safety-critical cyber-physical systems require robust control strategies against adversarial disturbances and modeling uncertainties.
We present a framework for approximate control and learning in partially observed systems to minimize the worst-case discounted cost over an infinite time horizon.
arXiv Detail & Related papers (2023-03-28T21:40:06Z) - Reinforcement Learning under Partial Observability Guided by Learned
Environment Models [1.1470070927586016]
We propose an approach for reinforcement learning (RL) in partially observable environments.
Our approach combines Q-learning with IoAlergia, a method for learning Markov decision processes.
We report on the validity of our approach and its promising performance in comparison to six state-of-the-art deep RL techniques.
arXiv Detail & Related papers (2022-06-23T13:55:13Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - The Impact of Data on the Stability of Learning-Based Control- Extended
Version [63.97366815968177]
We propose a Lyapunov-based measure for quantifying the impact of data on the certifiable control performance.
By modeling unknown system dynamics through Gaussian processes, we can determine the interrelation between model uncertainty and satisfaction of stability conditions.
arXiv Detail & Related papers (2020-11-20T19:10:01Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.