Calibrated Optimal Decision Making with Multiple Data Sources and
Limited Outcome
- URL: http://arxiv.org/abs/2104.10554v1
- Date: Wed, 21 Apr 2021 14:24:17 GMT
- Title: Calibrated Optimal Decision Making with Multiple Data Sources and
Limited Outcome
- Authors: Hengrui Cai, Wenbin Lu, Rui Song
- Abstract summary: We consider the optimal decision-making problem in a primary sample of interest with multiple auxiliary sources available.
This paper proposes a novel calibrated optimal decision rule (CODR) to address the limited outcome.
- Score: 20.60767385364074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the optimal decision-making problem in a primary sample of
interest with multiple auxiliary sources available. The outcome of interest is
limited in the sense that it is only observed in the primary sample. In
reality, such multiple data sources may belong to different populations and
thus cannot be combined directly. This paper proposes a novel calibrated
optimal decision rule (CODR) to address the limited outcome, by leveraging the
shared pattern in multiple data sources. Under a mild and testable assumption
that the conditional means of intermediate outcomes in different samples are
equal given baseline covariates and the treatment information, we can show that
the calibrated mean outcome of interest under the CODR is unbiased and more
efficient than using the primary sample solely. Extensive experiments on
simulated datasets demonstrate empirical validity and improvement of the
proposed CODR, followed by a real application on the MIMIC-III as the primary
sample with auxiliary data from eICU.
Related papers
- On the Influence of Data Resampling for Deep Learning-Based Log Anomaly Detection: Insights and Recommendations [10.931620604044486]
Class imbalance in public data commonly used to train Log Anomaly Detection models.
Mitigating class imbalance through data resampling has proven effective for other software engineering tasks.
This study provides an in-depth analysis of the impact of diverse data resampling methods on existingAD approaches.
arXiv Detail & Related papers (2024-05-06T14:01:05Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Data thinning for convolution-closed distributions [2.299914829977005]
We propose data thinning, an approach for splitting an observation into two or more independent parts that sum to the original observation.
We show that data thinning can be used to validate the results of unsupervised learning approaches.
arXiv Detail & Related papers (2023-01-18T02:47:41Z) - Optimal Condition Training for Target Source Separation [56.86138859538063]
We propose a new optimal condition training method for single-channel target source separation.
We show that the complementary information carried by the diverse semantic concepts significantly helps to disentangle and isolate sources of interest.
arXiv Detail & Related papers (2022-11-11T00:04:55Z) - Rethinking Collaborative Metric Learning: Toward an Efficient
Alternative without Negative Sampling [156.7248383178991]
Collaborative Metric Learning (CML) paradigm has aroused wide interest in the area of recommendation systems (RS)
We find that negative sampling would lead to a biased estimation of the generalization error.
Motivated by this, we propose an efficient alternative without negative sampling for CML named textitSampling-Free Collaborative Metric Learning (SFCML)
arXiv Detail & Related papers (2022-06-23T08:50:22Z) - Estimation of Local Average Treatment Effect by Data Combination [3.655021726150368]
It is important to estimate the local average treatment effect (LATE) when compliance with a treatment assignment is incomplete.
Previously proposed methods for LATE estimation required all relevant variables to be jointly observed in a single dataset.
We propose a weighted least squares estimator that enables simpler model selection by avoiding the minimax objective formulation.
arXiv Detail & Related papers (2021-09-11T03:51:48Z) - The Pitfalls of Sample Selection: A Case Study on Lung Nodule
Classification [13.376247652484274]
In lung nodule classification, many works report results on the publicly available LIDC dataset. In theory, this should allow a direct comparison of the performance of proposed methods and assess the impact of individual contributions.
We find that each employs a different data selection process, leading to largely varying total number of samples and ratios between benign and malignant cases.
We show that specific choices can have severe impact on the data distribution where it may be possible to achieve superior performance on one sample distribution but not on another.
arXiv Detail & Related papers (2021-08-11T18:07:07Z) - GEAR: On Optimal Decision Making with Auxiliary Data [20.607673853640744]
Current optimal decision rule (ODR) methods usually require the primary outcome of interest in samples for assessing treatment effects, namely the experimental sample.
This paper is inspired to address this challenge by making use of an auxiliary sample to facilitate the estimation of ODR in the experimental sample.
We propose an auGmented inverse propensity weighted Experimental and Auxiliary sample-based decision Rule (GEAR) by maximizing the augmented inverse propensity weighted value estimator over a class of decision rules.
arXiv Detail & Related papers (2021-04-21T14:59:25Z) - Multi-Source Causal Inference Using Control Variates [81.57072928775509]
We propose a general algorithm to estimate causal effects from emphmultiple data sources.
We show theoretically that this reduces the variance of the ATE estimate.
We apply this framework to inference from observational data under an outcome selection bias.
arXiv Detail & Related papers (2021-03-30T21:20:51Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.