OCMDP: Observation-Constrained Markov Decision Process
- URL: http://arxiv.org/abs/2411.07087v4
- Date: Thu, 23 Jan 2025 18:08:31 GMT
- Title: OCMDP: Observation-Constrained Markov Decision Process
- Authors: Taiyi Wang, Jianheng Liu, Bryan Lee, Zhihao Wu, Yu Wu,
- Abstract summary: We tackle the challenge of simultaneously learning observation and control strategies in cost-sensitive environments.
We develop an iterative, model-free deep reinforcement learning algorithm that separates the sensing and control components of the policy.
We validate our approach on a simulated diagnostic task and a realistic healthcare environment using HeartPole.
- Score: 9.13947446878397
- License:
- Abstract: In many practical applications, decision-making processes must balance the costs of acquiring information with the benefits it provides. Traditional control systems often assume full observability, an unrealistic assumption when observations are expensive. We tackle the challenge of simultaneously learning observation and control strategies in such cost-sensitive environments by introducing the Observation-Constrained Markov Decision Process (OCMDP), where the policy influences the observability of the true state. To manage the complexity arising from the combined observation and control actions, we develop an iterative, model-free deep reinforcement learning algorithm that separates the sensing and control components of the policy. This decomposition enables efficient learning in the expanded action space by focusing on when and what to observe, as well as determining optimal control actions, without requiring knowledge of the environment's dynamics. We validate our approach on a simulated diagnostic task and a realistic healthcare environment using HeartPole. Given both scenarios, the experimental results demonstrate that our model achieves a substantial reduction in observation costs on average, significantly outperforming baseline methods by a notable margin in efficiency.
Related papers
- Do We Need to Verify Step by Step? Rethinking Process Supervision from a Theoretical Perspective [59.61868506896214]
We show that under standard data coverage assumptions, reinforcement learning is no more statistically difficult than through process supervision.
We prove that any policy's advantage function can serve as an optimal process reward model.
arXiv Detail & Related papers (2025-02-14T22:21:56Z) - Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation [24.984938229619075]
Reinforcement Learning has revolutionized decision-making processes in dynamic environments.
The lack of precise environmental information makes it challenging to provide clear feedback signals.
We develop a self-feedback mechanism for autonomous goal detection and cessation upon task completion.
arXiv Detail & Related papers (2024-09-14T21:42:17Z) - Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency.
In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution.
Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z) - Explaining by Imitating: Understanding Decisions by Interpretable Policy
Learning [72.80902932543474]
Understanding human behavior from observed data is critical for transparency and accountability in decision-making.
Consider real-world settings such as healthcare, in which modeling a decision-maker's policy is challenging.
We propose a data-driven representation of decision-making behavior that inheres transparency by design, accommodates partial observability, and operates completely offline.
arXiv Detail & Related papers (2023-10-28T13:06:14Z) - Conditional Kernel Imitation Learning for Continuous State Environments [9.750698192309978]
We introduce a novel conditional kernel density estimation-based imitation learning framework.
We show consistently superior empirical performance over many state-of-the-art IL algorithms.
arXiv Detail & Related papers (2023-08-24T05:26:42Z) - Online Modeling and Monitoring of Dependent Processes under Resource
Constraints [11.813520177037763]
The proposed method designs a collaborative learning-based upper confidence bound (CL-UCB) algorithm to optimally balance the exploitation and exploration of dependent processes under limited resources.
efficiency of the proposed method is demonstrated through theoretical analysis, simulation studies and an empirical study of adaptive cognitive monitoring in Alzheimer's disease.
arXiv Detail & Related papers (2023-07-26T14:14:38Z) - Worst-Case Control and Learning Using Partial Observations Over an
Infinite Time-Horizon [2.456909016197174]
Safety-critical cyber-physical systems require robust control strategies against adversarial disturbances and modeling uncertainties.
We present a framework for approximate control and learning in partially observed systems to minimize the worst-case discounted cost over an infinite time horizon.
arXiv Detail & Related papers (2023-03-28T21:40:06Z) - Reinforcement Learning under Partial Observability Guided by Learned
Environment Models [1.1470070927586016]
We propose an approach for reinforcement learning (RL) in partially observable environments.
Our approach combines Q-learning with IoAlergia, a method for learning Markov decision processes.
We report on the validity of our approach and its promising performance in comparison to six state-of-the-art deep RL techniques.
arXiv Detail & Related papers (2022-06-23T13:55:13Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - The Impact of Data on the Stability of Learning-Based Control- Extended
Version [63.97366815968177]
We propose a Lyapunov-based measure for quantifying the impact of data on the certifiable control performance.
By modeling unknown system dynamics through Gaussian processes, we can determine the interrelation between model uncertainty and satisfaction of stability conditions.
arXiv Detail & Related papers (2020-11-20T19:10:01Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.