Pre-emptive learning-to-defer for sequential medical decision-making
under uncertainty
- URL: http://arxiv.org/abs/2109.06312v1
- Date: Mon, 13 Sep 2021 20:43:10 GMT
- Title: Pre-emptive learning-to-defer for sequential medical decision-making
under uncertainty
- Authors: Shalmali Joshi and Sonali Parbhoo and Finale Doshi-Velez
- Abstract summary: We propose SLTD (Sequential Learning-to-Defer') as a framework for learning-to-defer pre-emptively to an expert in sequential decision-making settings.
SLTD measures the likelihood of improving value of deferring now versus later based on the underlying uncertainty in dynamics.
- Score: 35.077494648756876
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose SLTD (`Sequential Learning-to-Defer') a framework for
learning-to-defer pre-emptively to an expert in sequential decision-making
settings. SLTD measures the likelihood of improving value of deferring now
versus later based on the underlying uncertainty in dynamics. In particular, we
focus on the non-stationarity in the dynamics to accurately learn the deferral
policy. We demonstrate our pre-emptive deferral can identify regions where the
current policy has a low probability of improving outcomes. SLTD outperforms
existing non-sequential learning-to-defer baselines, whilst reducing overall
uncertainty on multiple synthetic and real-world simulators with non-stationary
dynamics. We further derive and decompose the propagated (long-term)
uncertainty for interpretation by the domain expert to provide an indication of
when the model's performance is reliable.
Related papers
- Adversary-Free Counterfactual Prediction via Information-Regularized Representations [8.760019957506719]
We study counterfactual prediction under decoder bias and propose a mathematically grounded, information-theoretic approach.<n>We derive a tractable variational objective that upper-bounds the information term and couples it with a supervised assignment, yielding a stable, provably motivated training criterion.<n>We evaluate the method on controlled numerical simulations and a real-world clinical dataset, comparing against recent state-of-the-art balancing, reweighting, and adversarial baselines.
arXiv Detail & Related papers (2025-10-17T09:49:04Z) - Towards Reliable LLM-based Robot Planning via Combined Uncertainty Estimation [68.106428321492]
Large language models (LLMs) demonstrate advanced reasoning abilities, enabling robots to understand natural language instructions and generate high-level plans with appropriate grounding.<n>LLMs hallucinations present a significant challenge, often leading to overconfident yet potentially misaligned or unsafe plans.<n>We present Combined Uncertainty estimation for Reliable Embodied planning (CURE), which decomposes the uncertainty into epistemic and intrinsic uncertainty, each estimated separately.
arXiv Detail & Related papers (2025-10-09T10:26:58Z) - On the System Theoretic Offline Learning of Continuous-Time LQR with Exogenous Disturbances [3.701656361145375]
We analyze offline designs of linear quadratic regulator (LQR) strategies with uncertain disturbances.<n>Our approach builds on the fundamental learning-based framework of adaptive dynamic programming.
arXiv Detail & Related papers (2025-09-20T17:14:27Z) - Adaptive Variance-Penalized Continual Learning with Fisher Regularization [0.0]
This work presents a novel continual learning framework that integrates Fisher-weighted asymmetric regularization of parameter variances.<n>Our method dynamically modulates regularization intensity according to parameter uncertainty, achieving enhanced stability and performance.
arXiv Detail & Related papers (2025-08-15T21:49:28Z) - Look Before Leap: Look-Ahead Planning with Uncertainty in Reinforcement Learning [4.902161835372679]
We propose a novel framework for uncertainty-aware policy optimization with model-based exploratory planning.
In the policy optimization phase, we leverage an uncertainty-driven exploratory policy to actively collect diverse training samples.
Our approach offers flexibility and applicability to tasks with varying state/action spaces and reward structures.
arXiv Detail & Related papers (2025-03-26T01:07:35Z) - Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization [11.642505299142956]
Continuous control of non-stationary environments is a major challenge for deep reinforcement learning algorithms.<n>We show that performing on-policy reinforcement learning with an evidential critic provides both of these properties.<n>We name the resulting algorithm as $textit Evidential Proximal Policy Optimization (EPPO)$ due to the integral role of evidential uncertainty in both policy evaluation and policy improvement stages.
arXiv Detail & Related papers (2025-03-03T12:23:07Z) - Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks.
In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge.
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - Rich-Observation Reinforcement Learning with Continuous Latent Dynamics [43.84391209459658]
We introduce a new theoretical framework, RichCLD (Rich-Observation RL with Continuous Latent Dynamics), in which the agent performs control based on high-dimensional observations.
Our main contribution is a new algorithm for this setting that is provably statistically and computationally efficient.
arXiv Detail & Related papers (2024-05-29T17:02:49Z) - Pausing Policy Learning in Non-stationary Reinforcement Learning [23.147618992106867]
We tackle a common belief that continually updating the decision is optimal to minimize the temporal gap.
We propose forecasting an online reinforcement learning framework and show that strategically pausing decision updates yields better overall performance.
arXiv Detail & Related papers (2024-05-25T04:38:09Z) - Dynamic Environment Responsive Online Meta-Learning with Fairness
Awareness [30.44174123736964]
We introduce an innovative adaptive fairness-aware online meta-learning algorithm, referred to as FairSAOML.
Our experimental evaluation on various real-world datasets in dynamic environments demonstrates that our proposed FairSAOML algorithm consistently outperforms alternative approaches.
arXiv Detail & Related papers (2024-02-19T17:44:35Z) - The Statistical Benefits of Quantile Temporal-Difference Learning for
Value Estimation [53.53493178394081]
We analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD)
Even if a practitioner has no interest in the return distribution beyond the mean, QTD may offer performance superior to approaches such as classical TD learning.
arXiv Detail & Related papers (2023-05-28T10:52:46Z) - Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs.
Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation.
We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z) - ESC-Rules: Explainable, Semantically Constrained Rule Sets [11.160515561004619]
We describe a novel approach to explainable prediction of a continuous variable based on learning fuzzy weighted rules.
Our model trains a set of weighted rules to maximise prediction accuracy and minimise an ontology-based'semantic loss' function.
This system fuses quantitative sub-symbolic learning with symbolic learning and constraints based on domain knowledge.
arXiv Detail & Related papers (2022-08-26T09:29:30Z) - A Regret Minimization Approach to Iterative Learning Control [61.37088759497583]
We propose a new performance metric, planning regret, which replaces the standard uncertainty assumptions with worst case regret.
We provide theoretical and empirical evidence that the proposed algorithm outperforms existing methods on several benchmarks.
arXiv Detail & Related papers (2021-02-26T13:48:49Z) - DEUP: Direct Epistemic Uncertainty Prediction [56.087230230128185]
Epistemic uncertainty is part of out-of-sample prediction error due to the lack of knowledge of the learner.
We propose a principled approach for directly estimating epistemic uncertainty by learning to predict generalization error and subtracting an estimate of aleatoric uncertainty.
arXiv Detail & Related papers (2021-02-16T23:50:35Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.