Calibrated Optimal Decision Making with Multiple Data Sources and
Limited Outcome
- URL: http://arxiv.org/abs/2104.10554v1
- Date: Wed, 21 Apr 2021 14:24:17 GMT
- Title: Calibrated Optimal Decision Making with Multiple Data Sources and
Limited Outcome
- Authors: Hengrui Cai, Wenbin Lu, Rui Song
- Abstract summary: We consider the optimal decision-making problem in a primary sample of interest with multiple auxiliary sources available.
This paper proposes a novel calibrated optimal decision rule (CODR) to address the limited outcome.
- Score: 20.60767385364074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the optimal decision-making problem in a primary sample of
interest with multiple auxiliary sources available. The outcome of interest is
limited in the sense that it is only observed in the primary sample. In
reality, such multiple data sources may belong to different populations and
thus cannot be combined directly. This paper proposes a novel calibrated
optimal decision rule (CODR) to address the limited outcome, by leveraging the
shared pattern in multiple data sources. Under a mild and testable assumption
that the conditional means of intermediate outcomes in different samples are
equal given baseline covariates and the treatment information, we can show that
the calibrated mean outcome of interest under the CODR is unbiased and more
efficient than using the primary sample solely. Extensive experiments on
simulated datasets demonstrate empirical validity and improvement of the
proposed CODR, followed by a real application on the MIMIC-III as the primary
sample with auxiliary data from eICU.
Related papers
- Federated Causal Inference: Multi-Centric ATE Estimation beyond Meta-Analysis [12.896319628045967]
We study Federated Causal Inference, an approach to estimate treatment effects from decentralized data across centers.
We compare three classes of Average Treatment Effect (ATE) estimators derived from the Plug-in G-Formula.
arXiv Detail & Related papers (2024-10-22T10:19:17Z) - Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation [57.6797306341115]
We take an initial step towards understanding various matching-based DD methods from the perspective of sample difficulty.
We then extend the neural scaling laws of data pruning to DD to theoretically explain these matching-based methods.
We introduce the Sample Difficulty Correction (SDC) approach, designed to predominantly generate easier samples to achieve higher dataset quality.
arXiv Detail & Related papers (2024-08-22T15:20:32Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Data thinning for convolution-closed distributions [2.299914829977005]
We propose data thinning, an approach for splitting an observation into two or more independent parts that sum to the original observation.
We show that data thinning can be used to validate the results of unsupervised learning approaches.
arXiv Detail & Related papers (2023-01-18T02:47:41Z) - Optimal Condition Training for Target Source Separation [56.86138859538063]
We propose a new optimal condition training method for single-channel target source separation.
We show that the complementary information carried by the diverse semantic concepts significantly helps to disentangle and isolate sources of interest.
arXiv Detail & Related papers (2022-11-11T00:04:55Z) - Rethinking Collaborative Metric Learning: Toward an Efficient
Alternative without Negative Sampling [156.7248383178991]
Collaborative Metric Learning (CML) paradigm has aroused wide interest in the area of recommendation systems (RS)
We find that negative sampling would lead to a biased estimation of the generalization error.
Motivated by this, we propose an efficient alternative without negative sampling for CML named textitSampling-Free Collaborative Metric Learning (SFCML)
arXiv Detail & Related papers (2022-06-23T08:50:22Z) - Estimation of Local Average Treatment Effect by Data Combination [3.655021726150368]
It is important to estimate the local average treatment effect (LATE) when compliance with a treatment assignment is incomplete.
Previously proposed methods for LATE estimation required all relevant variables to be jointly observed in a single dataset.
We propose a weighted least squares estimator that enables simpler model selection by avoiding the minimax objective formulation.
arXiv Detail & Related papers (2021-09-11T03:51:48Z) - The Pitfalls of Sample Selection: A Case Study on Lung Nodule
Classification [13.376247652484274]
In lung nodule classification, many works report results on the publicly available LIDC dataset. In theory, this should allow a direct comparison of the performance of proposed methods and assess the impact of individual contributions.
We find that each employs a different data selection process, leading to largely varying total number of samples and ratios between benign and malignant cases.
We show that specific choices can have severe impact on the data distribution where it may be possible to achieve superior performance on one sample distribution but not on another.
arXiv Detail & Related papers (2021-08-11T18:07:07Z) - GEAR: On Optimal Decision Making with Auxiliary Data [20.607673853640744]
Current optimal decision rule (ODR) methods usually require the primary outcome of interest in samples for assessing treatment effects, namely the experimental sample.
This paper is inspired to address this challenge by making use of an auxiliary sample to facilitate the estimation of ODR in the experimental sample.
We propose an auGmented inverse propensity weighted Experimental and Auxiliary sample-based decision Rule (GEAR) by maximizing the augmented inverse propensity weighted value estimator over a class of decision rules.
arXiv Detail & Related papers (2021-04-21T14:59:25Z) - Multi-Source Causal Inference Using Control Variates [81.57072928775509]
We propose a general algorithm to estimate causal effects from emphmultiple data sources.
We show theoretically that this reduces the variance of the ATE estimate.
We apply this framework to inference from observational data under an outcome selection bias.
arXiv Detail & Related papers (2021-03-30T21:20:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.