Related papers: OCMDP: Observation-Constrained Markov Decision Process

OCMDP: Observation-Constrained Markov Decision Process

URL: http://arxiv.org/abs/2411.07087v2
Date: Tue, 12 Nov 2024 12:03:07 GMT
Title: OCMDP: Observation-Constrained Markov Decision Process
Authors: Taiyi Wang, Jianheng Liu, Bryan Lee, Zhihao Wu, Yu Wu,
Abstract summary: We tackle the challenge of simultaneously learning observation and control strategies in cost-sensitive environments. We develop an iterative, model-free deep reinforcement learning algorithm that separates the sensing and control components of the policy. We validate our approach on a simulated diagnostic task and a realistic healthcare environment using HeartPole.
Score: 9.13947446878397
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In many practical applications, decision-making processes must balance the costs of acquiring information with the benefits it provides. Traditional control systems often assume full observability, an unrealistic assumption when observations are expensive. We tackle the challenge of simultaneously learning observation and control strategies in such cost-sensitive environments by introducing the Observation-Constrained Markov Decision Process (OCMDP), where the policy influences the observability of the true state. To manage the complexity arising from the combined observation and control actions, we develop an iterative, model-free deep reinforcement learning algorithm that separates the sensing and control components of the policy. This decomposition enables efficient learning in the expanded action space by focusing on when and what to observe, as well as determining optimal control actions, without requiring knowledge of the environment's dynamics. We validate our approach on a simulated diagnostic task and a realistic healthcare environment using HeartPole. Given both scenarios, the experimental results demonstrate that our model achieves a substantial reduction in observation costs on average, significantly outperforming baseline methods by a notable margin in efficiency.

Related papers

Instance-Dependent Continuous-Time Reinforcement Learning via Maximum Likelihood Estimation [27.232790785138427]
Continuous-time reinforcement learning (CTRL) provides a natural framework for sequential decision-making in dynamic environments.<n>While has shown growing empirical success, its ability to adapt to varying levels of problem difficulty remains poorly understood.<n>In this work, we investigate the instance-dependent behavior of and introduce a simple, model-based algorithm built on maximum likelihood estimation.
arXiv Detail & Related papers (2025-08-04T06:25:45Z)
Designing Robust Software Sensors for Nonlinear Systems via Neural Networks and Adaptive Sliding Mode Control [2.884893167166808]
This paper presents a novel approach to designing software sensors for nonlinear dynamical systems.<n>Unlike traditional model-based observers that rely on explicit transformations or linearization, the proposed framework integrates neural networks with adaptive Sliding Mode Control (SMC)<n>The training methodology leverages the system's governing equations as a physics-based constraint, enabling observer synthesis without access to ground-truth state trajectories.
arXiv Detail & Related papers (2025-07-09T13:06:58Z)
Do We Need to Verify Step by Step? Rethinking Process Supervision from a Theoretical Perspective [59.61868506896214]
We show that under standard data coverage assumptions, reinforcement learning is no more statistically difficult than through process supervision. We prove that any policy's advantage function can serve as an optimal process reward model.
arXiv Detail & Related papers (2025-02-14T22:21:56Z)
Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation [24.984938229619075]
Reinforcement Learning has revolutionized decision-making processes in dynamic environments. The lack of precise environmental information makes it challenging to provide clear feedback signals. We develop a self-feedback mechanism for autonomous goal detection and cessation upon task completion.
arXiv Detail & Related papers (2024-09-14T21:42:17Z)
Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency. In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution. Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z)
Explaining by Imitating: Understanding Decisions by Interpretable Policy Learning [72.80902932543474]
Understanding human behavior from observed data is critical for transparency and accountability in decision-making. Consider real-world settings such as healthcare, in which modeling a decision-maker's policy is challenging. We propose a data-driven representation of decision-making behavior that inheres transparency by design, accommodates partial observability, and operates completely offline.
arXiv Detail & Related papers (2023-10-28T13:06:14Z)
Conditional Kernel Imitation Learning for Continuous State Environments [9.750698192309978]
We introduce a novel conditional kernel density estimation-based imitation learning framework. We show consistently superior empirical performance over many state-of-the-art IL algorithms.
arXiv Detail & Related papers (2023-08-24T05:26:42Z)
Online Modeling and Monitoring of Dependent Processes under Resource Constraints [11.813520177037763]
The proposed method designs a collaborative learning-based upper confidence bound (CL-UCB) algorithm to optimally balance the exploitation and exploration of dependent processes under limited resources. efficiency of the proposed method is demonstrated through theoretical analysis, simulation studies and an empirical study of adaptive cognitive monitoring in Alzheimer's disease.
arXiv Detail & Related papers (2023-07-26T14:14:38Z)
Worst-Case Control and Learning Using Partial Observations Over an Infinite Time-Horizon [2.456909016197174]
Safety-critical cyber-physical systems require robust control strategies against adversarial disturbances and modeling uncertainties. We present a framework for approximate control and learning in partially observed systems to minimize the worst-case discounted cost over an infinite time horizon.
arXiv Detail & Related papers (2023-03-28T21:40:06Z)
Reinforcement Learning under Partial Observability Guided by Learned Environment Models [1.1470070927586016]
We propose an approach for reinforcement learning (RL) in partially observable environments. Our approach combines Q-learning with IoAlergia, a method for learning Markov decision processes. We report on the validity of our approach and its promising performance in comparison to six state-of-the-art deep RL techniques.
arXiv Detail & Related papers (2022-06-23T13:55:13Z)
Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors. Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP) We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
The Impact of Data on the Stability of Learning-Based Control- Extended Version [63.97366815968177]
We propose a Lyapunov-based measure for quantifying the impact of data on the certifiable control performance. By modeling unknown system dynamics through Gaussian processes, we can determine the interrelation between model uncertainty and satisfaction of stability conditions.
arXiv Detail & Related papers (2020-11-20T19:10:01Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.