Interpretable Causal Inference for Analyzing Wearable, Sensor, and Distributional Data
- URL: http://arxiv.org/abs/2312.10569v2
- Date: Wed, 20 Mar 2024 21:06:43 GMT
- Title: Interpretable Causal Inference for Analyzing Wearable, Sensor, and Distributional Data
- Authors: Srikar Katta, Harsh Parikh, Cynthia Rudin, Alexander Volfovsky,
- Abstract summary: We develop an interpretable method for distributional data analysis that ensures trustworthy and robust decision-making.
We demonstrate ADD MALTS' utility by studying the effectiveness of continuous glucose monitors in mitigating diabetes risks.
- Score: 62.56890808004615
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many modern causal questions ask how treatments affect complex outcomes that are measured using wearable devices and sensors. Current analysis approaches require summarizing these data into scalar statistics (e.g., the mean), but these summaries can be misleading. For example, disparate distributions can have the same means, variances, and other statistics. Researchers can overcome the loss of information by instead representing the data as distributions. We develop an interpretable method for distributional data analysis that ensures trustworthy and robust decision-making: Analyzing Distributional Data via Matching After Learning to Stretch (ADD MALTS). We (i) provide analytical guarantees of the correctness of our estimation strategy, (ii) demonstrate via simulation that ADD MALTS outperforms other distributional data analysis methods at estimating treatment effects, and (iii) illustrate ADD MALTS' ability to verify whether there is enough cohesion between treatment and control units within subpopulations to trustworthily estimate treatment effects. We demonstrate ADD MALTS' utility by studying the effectiveness of continuous glucose monitors in mitigating diabetes risks.
Related papers
- A Semiparametric Approach to Causal Inference [2.092897805817524]
In causal inference, an important problem is to quantify the effects of interventions or treatments.
In this paper, we employ a semiparametric density ratio model (DRM) to characterize the counterfactual distributions.
Our model offers flexibility by avoiding strict parametric assumptions on the counterfactual distributions.
arXiv Detail & Related papers (2024-11-01T18:03:38Z) - Estimating Individual Dose-Response Curves under Unobserved Confounders from Observational Data [6.166869525631879]
We present ContiVAE, a novel framework for estimating causal effects of continuous treatments, measured by individual dose-response curves.
We show that ContiVAE outperforms existing methods by up to 62%, demonstrating its robustness and flexibility.
arXiv Detail & Related papers (2024-10-21T07:24:26Z) - Extracting Training Data from Unconditional Diffusion Models [76.85077961718875]
diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI)
We aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization.
Based on the theoretical analysis, we propose a novel data extraction method called textbfSurrogate condItional Data Extraction (SIDE) that leverages a trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models.
arXiv Detail & Related papers (2024-06-18T16:20:12Z) - Estimating Dyadic Treatment Effects with Unknown Confounders [0.0]
We propose a statistical inference method for assessing treatment effects with dyadic data.
Under the assumption that the treatments follow an exchangeable distribution, our approach allows for the presence of unobserved confounding factors.
We apply our method to international trade data to assess the impact of free trade agreements on bilateral trade flows.
arXiv Detail & Related papers (2024-05-26T12:32:14Z) - LMD3: Language Model Data Density Dependence [78.76731603461832]
We develop a methodology for analyzing language model task performance at the individual example level based on training data density estimation.
Experiments with paraphrasing as a controlled intervention on finetuning data demonstrate that increasing the support in the training distribution for specific test queries results in a measurable increase in density.
We conclude that our framework can provide statistical evidence of the dependence of a target model's predictions on subsets of its training data.
arXiv Detail & Related papers (2024-05-10T09:03:27Z) - Hypothesis testing for matched pairs with missing data by maximum mean
discrepancy: An application to continuous glucose monitoring [0.8399688944263843]
This paper proposes new estimators of the maximum mean discrepancy (MMD) to handle complex matched pairs with missing data.
These estimators can detect differences in data distributions under different missingness mechanisms.
Data from continuous glucose monitoring in a longitudinal population-based diabetes study are used to illustrate the application of this approach.
arXiv Detail & Related papers (2022-06-03T14:20:11Z) - The interventional Bayesian Gaussian equivalent score for Bayesian
causal inference with unknown soft interventions [0.0]
In certain settings, such as genomics, we may have data from heterogeneous study conditions, with soft (partial) interventions only pertaining to a subset of the study variables.
We define the interventional BGe score for a mixture of observational and interventional data, where the targets and effects of intervention may be unknown.
arXiv Detail & Related papers (2022-05-05T12:32:08Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - BayesIMP: Uncertainty Quantification for Causal Data Fusion [52.184885680729224]
We study the causal data fusion problem, where datasets pertaining to multiple causal graphs are combined to estimate the average treatment effect of a target variable.
We introduce a framework which combines ideas from probabilistic integration and kernel mean embeddings to represent interventional distributions in the reproducing kernel Hilbert space.
arXiv Detail & Related papers (2021-06-07T10:14:18Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.