Pre-emptive learning-to-defer for sequential medical decision-making
under uncertainty
- URL: http://arxiv.org/abs/2109.06312v1
- Date: Mon, 13 Sep 2021 20:43:10 GMT
- Title: Pre-emptive learning-to-defer for sequential medical decision-making
under uncertainty
- Authors: Shalmali Joshi and Sonali Parbhoo and Finale Doshi-Velez
- Abstract summary: We propose SLTD (Sequential Learning-to-Defer') as a framework for learning-to-defer pre-emptively to an expert in sequential decision-making settings.
SLTD measures the likelihood of improving value of deferring now versus later based on the underlying uncertainty in dynamics.
- Score: 35.077494648756876
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose SLTD (`Sequential Learning-to-Defer') a framework for
learning-to-defer pre-emptively to an expert in sequential decision-making
settings. SLTD measures the likelihood of improving value of deferring now
versus later based on the underlying uncertainty in dynamics. In particular, we
focus on the non-stationarity in the dynamics to accurately learn the deferral
policy. We demonstrate our pre-emptive deferral can identify regions where the
current policy has a low probability of improving outcomes. SLTD outperforms
existing non-sequential learning-to-defer baselines, whilst reducing overall
uncertainty on multiple synthetic and real-world simulators with non-stationary
dynamics. We further derive and decompose the propagated (long-term)
uncertainty for interpretation by the domain expert to provide an indication of
when the model's performance is reliable.
Related papers
- Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks.
In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge.
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - Rich-Observation Reinforcement Learning with Continuous Latent Dynamics [43.84391209459658]
We introduce a new theoretical framework, RichCLD (Rich-Observation RL with Continuous Latent Dynamics), in which the agent performs control based on high-dimensional observations.
Our main contribution is a new algorithm for this setting that is provably statistically and computationally efficient.
arXiv Detail & Related papers (2024-05-29T17:02:49Z) - Pausing Policy Learning in Non-stationary Reinforcement Learning [23.147618992106867]
We tackle a common belief that continually updating the decision is optimal to minimize the temporal gap.
We propose forecasting an online reinforcement learning framework and show that strategically pausing decision updates yields better overall performance.
arXiv Detail & Related papers (2024-05-25T04:38:09Z) - Dynamic Environment Responsive Online Meta-Learning with Fairness
Awareness [30.44174123736964]
We introduce an innovative adaptive fairness-aware online meta-learning algorithm, referred to as FairSAOML.
Our experimental evaluation on various real-world datasets in dynamic environments demonstrates that our proposed FairSAOML algorithm consistently outperforms alternative approaches.
arXiv Detail & Related papers (2024-02-19T17:44:35Z) - The Statistical Benefits of Quantile Temporal-Difference Learning for
Value Estimation [53.53493178394081]
We analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD)
Even if a practitioner has no interest in the return distribution beyond the mean, QTD may offer performance superior to approaches such as classical TD learning.
arXiv Detail & Related papers (2023-05-28T10:52:46Z) - Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs.
Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation.
We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z) - ESC-Rules: Explainable, Semantically Constrained Rule Sets [11.160515561004619]
We describe a novel approach to explainable prediction of a continuous variable based on learning fuzzy weighted rules.
Our model trains a set of weighted rules to maximise prediction accuracy and minimise an ontology-based'semantic loss' function.
This system fuses quantitative sub-symbolic learning with symbolic learning and constraints based on domain knowledge.
arXiv Detail & Related papers (2022-08-26T09:29:30Z) - A Regret Minimization Approach to Iterative Learning Control [61.37088759497583]
We propose a new performance metric, planning regret, which replaces the standard uncertainty assumptions with worst case regret.
We provide theoretical and empirical evidence that the proposed algorithm outperforms existing methods on several benchmarks.
arXiv Detail & Related papers (2021-02-26T13:48:49Z) - DEUP: Direct Epistemic Uncertainty Prediction [56.087230230128185]
Epistemic uncertainty is part of out-of-sample prediction error due to the lack of knowledge of the learner.
We propose a principled approach for directly estimating epistemic uncertainty by learning to predict generalization error and subtracting an estimate of aleatoric uncertainty.
arXiv Detail & Related papers (2021-02-16T23:50:35Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.