Personalized Treatment Effect Estimation from Unstructured Data
- URL: http://arxiv.org/abs/2507.20993v1
- Date: Mon, 28 Jul 2025 16:52:31 GMT
- Title: Personalized Treatment Effect Estimation from Unstructured Data
- Authors: Henri Arno, Thomas Demeester,
- Abstract summary: We introduce an approximate 'plug-in' method trained directly on the neural representations of unstructured data.<n>We then introduce two theoretically grounded estimators that leverage structured measurements of the confounders during training.<n>Our experiments on two benchmark datasets show that the plug-in method, directly trainable on large unstructured datasets, achieves strong empirical performance across all settings.
- Score: 8.468367158186007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing methods for estimating personalized treatment effects typically rely on structured covariates, limiting their applicability to unstructured data. Yet, leveraging unstructured data for causal inference has considerable application potential, for instance in healthcare, where clinical notes or medical images are abundant. To this end, we first introduce an approximate 'plug-in' method trained directly on the neural representations of unstructured data. However, when these fail to capture all confounding information, the method may be subject to confounding bias. We therefore introduce two theoretically grounded estimators that leverage structured measurements of the confounders during training, but allow estimating personalized treatment effects purely from unstructured inputs, while avoiding confounding bias. When these structured measurements are only available for a non-representative subset of the data, these estimators may suffer from sampling bias. To address this, we further introduce a regression-based correction that accounts for the non-uniform sampling, assuming the sampling mechanism is known or can be well-estimated. Our experiments on two benchmark datasets show that the plug-in method, directly trainable on large unstructured datasets, achieves strong empirical performance across all settings, despite its simplicity.
Related papers
- Simulating Biases for Interpretable Fairness in Offline and Online Classifiers [0.35998666903987897]
Mitigation methods are critical to ensure that model outcomes are adjusted to be fair.<n>We develop a framework for synthetic dataset generation with controllable bias injection.<n>In experiments, both offline and online learning approaches are employed.
arXiv Detail & Related papers (2025-07-14T11:04:24Z) - Robust Molecular Property Prediction via Densifying Scarce Labeled Data [51.55434084913129]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel meta-learning-based approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.<n>We demonstrate significant performance gains on challenging real-world datasets.
arXiv Detail & Related papers (2025-06-13T15:27:40Z) - A Unifying Framework for Robust and Efficient Inference with Unstructured Data [2.07180164747172]
This paper presents a general framework for conducting efficient inference on parameters derived from unstructured data.<n>We formalize this approach with MAR-S, a framework that unifies and extends existing methods for debiased inference.<n>Within this framework, we develop robust and efficient estimators for both descriptive and causal estimands.
arXiv Detail & Related papers (2025-05-01T04:11:25Z) - A Partial Initialization Strategy to Mitigate the Overfitting Problem in CATE Estimation with Hidden Confounding [44.874826691991565]
Estimating the conditional average treatment effect (CATE) from observational data plays a crucial role in areas such as e-commerce, healthcare, and economics.<n>Existing studies mainly rely on the strong ignorability assumption that there are no hidden confounders.<n>Data collected from randomized controlled trials (RCT) do not suffer from confounding but are usually limited by a small sample size.
arXiv Detail & Related papers (2025-01-15T15:58:16Z) - Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information.
We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z) - Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts.
We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep.
We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z) - Mitigating Dataset Bias by Using Per-sample Gradient [9.290757451344673]
We propose PGD (Per-sample Gradient-based Debiasing), that comprises three steps: training a model on uniform batch sampling, setting the importance of each sample in proportion to the norm of the sample gradient, and training the model using importance-batch sampling.
Compared with existing baselines for various synthetic and real-world datasets, the proposed method showed state-of-the-art accuracy for a the classification task.
arXiv Detail & Related papers (2022-05-31T11:41:02Z) - Spectral Clustering with Variance Information for Group Structure
Estimation in Panel Data [7.712669451925186]
We first conduct a local analysis which reveals that the variances of the individual coefficient estimates contain useful information for the estimation of group structure.
We then propose a method to estimate unobserved groupings for general panel data models that explicitly account for the variance information.
arXiv Detail & Related papers (2022-01-05T19:16:16Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Weight-of-evidence 2.0 with shrinkage and spline-binning [3.925373521409752]
We propose a formalized, data-driven and powerful method to transform categorical predictors.
We extend upon the weight-of-evidence approach and propose to estimate the proportions using shrinkage estimators.
We present the results of a series of experiments in a fraud detection setting, which illustrate the effectiveness of the presented approach.
arXiv Detail & Related papers (2021-01-05T13:13:16Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.