Progressive Generalization Risk Reduction for Data-Efficient Causal Effect Estimation
- URL: http://arxiv.org/abs/2411.11256v1
- Date: Mon, 18 Nov 2024 03:17:40 GMT
- Title: Progressive Generalization Risk Reduction for Data-Efficient Causal Effect Estimation
- Authors: Hechuan Wen, Tong Chen, Guanhua Ye, Li Kheng Chai, Shazia Sadiq, Hongzhi Yin,
- Abstract summary: Causal effect estimation (CEE) provides a crucial tool for predicting the unobserved counterfactual outcome for an entity.
In this paper, we study a more realistic CEE setting where the labelled data samples are scarce at the beginning.
We propose the Model Agnostic Causal Active Learning (MACAL) algorithm for batch-wise label acquisition.
- Score: 30.49865329385806
- License:
- Abstract: Causal effect estimation (CEE) provides a crucial tool for predicting the unobserved counterfactual outcome for an entity. As CEE relaxes the requirement for ``perfect'' counterfactual samples (e.g., patients with identical attributes and only differ in treatments received) that are impractical to obtain and can instead operate on observational data, it is usually used in high-stake domains like medical treatment effect prediction. Nevertheless, in those high-stake domains, gathering a decently sized, fully labelled observational dataset remains challenging due to hurdles associated with costs, ethics, expertise and time needed, etc., of which medical treatment surveys are a typical example. Consequently, if the training dataset is small in scale, low generalization risks can hardly be achieved on any CEE algorithms. Unlike existing CEE methods that assume the constant availability of a dataset with abundant samples, in this paper, we study a more realistic CEE setting where the labelled data samples are scarce at the beginning, while more can be gradually acquired over the course of training -- assuredly under a limited budget considering their expensive nature. Then, the problem naturally comes down to actively selecting the best possible samples to be labelled, e.g., identifying the next subset of patients to conduct the treatment survey. However, acquiring quality data for reducing the CEE risk under limited labelling budgets remains under-explored until now. To fill the gap, we theoretically analyse the generalization risk from an intriguing perspective of progressively shrinking its upper bound, and develop a principled label acquisition pipeline exclusively for CEE tasks. With our analysis, we propose the Model Agnostic Causal Active Learning (MACAL) algorithm for batch-wise label acquisition, which aims to reduce both the CEE model's uncertainty and the post-acquisition ...
Related papers
- Explainable AI for Classifying UTI Risk Groups Using a Real-World Linked EHR and Pathology Lab Dataset [0.47517735516852333]
We leverage a linked EHR dataset to characterize urinary tract infections (UTIs) in Bristol, North Somerset, and South Gloucestershire, UK.
A comprehensive data pre-processing and curation pipeline transforms the raw EHR data into a structured format suitable for AI modeling.
We build a UTI risk estimation framework informed by clinical expertise to estimate UTI risk across individual patient timelines.
arXiv Detail & Related papers (2024-11-26T18:10:51Z) - SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing [67.8991481023825]
Sepsis is the leading cause of in-hospital mortality in the USA.
Existing predictive models are usually trained on high-quality data with few missing information.
For the potential high-risk patients with low confidence due to limited observations, we propose a robust active sensing algorithm.
arXiv Detail & Related papers (2024-07-24T04:47:36Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Modeling Disagreement in Automatic Data Labelling for Semi-Supervised
Learning in Clinical Natural Language Processing [2.016042047576802]
We investigate the quality of uncertainty estimates from a range of current state-of-the-art predictive models applied to the problem of observation detection in radiology reports.
arXiv Detail & Related papers (2022-05-29T20:20:49Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Active Deep Learning on Entity Resolution by Risk Sampling [5.219701379581547]
Active Learning (AL) presents itself as a feasible solution that focuses on data deemed useful for model training.
We propose a novel AL approach of risk sampling for entity resolution (ER)
Based on the core-set characterization for AL, we theoretically derive an optimization model which aims to minimize core-set loss with non-uniform continuity.
We empirically verify the efficacy of the proposed approach on real data by a comparative study.
arXiv Detail & Related papers (2020-12-23T20:38:25Z) - Increasing the efficiency of randomized trial estimates via linear
adjustment for a prognostic score [59.75318183140857]
Estimating causal effects from randomized experiments is central to clinical research.
Most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control.
arXiv Detail & Related papers (2020-12-17T21:10:10Z) - Efficient Estimation and Evaluation of Prediction Rules in
Semi-Supervised Settings under Stratified Sampling [6.930951733450623]
We propose a two-step semi-supervised learning (SSL) procedure for evaluating a prediction rule derived from a working binary regression model.
In step I, we impute the missing labels via weighted regression with nonlinear basis functions to account for nonrandom sampling.
In step II, we augment the initial imputations to ensure the consistency of the resulting estimators.
arXiv Detail & Related papers (2020-10-19T12:54:45Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z) - Causal Inference With Selectively Deconfounded Data [22.624714904663424]
We consider the benefit of incorporating a large confounded observational dataset (confounder unobserved) alongside a small deconfounded observational dataset (confounder revealed) when estimating the Average Treatment Effect (ATE)
Our theoretical results suggest that the inclusion of confounded data can significantly reduce the quantity of deconfounded data required to estimate the ATE to within a desired accuracy level.
arXiv Detail & Related papers (2020-02-25T18:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.