Related papers: Calibrated Optimal Decision Making with Multiple Data Sources and Limited Outcome

Calibrated Optimal Decision Making with Multiple Data Sources and Limited Outcome

URL: http://arxiv.org/abs/2104.10554v1
Date: Wed, 21 Apr 2021 14:24:17 GMT
Title: Calibrated Optimal Decision Making with Multiple Data Sources and Limited Outcome
Authors: Hengrui Cai, Wenbin Lu, Rui Song
Abstract summary: We consider the optimal decision-making problem in a primary sample of interest with multiple auxiliary sources available. This paper proposes a novel calibrated optimal decision rule (CODR) to address the limited outcome.
Score: 20.60767385364074
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the optimal decision-making problem in a primary sample of interest with multiple auxiliary sources available. The outcome of interest is limited in the sense that it is only observed in the primary sample. In reality, such multiple data sources may belong to different populations and thus cannot be combined directly. This paper proposes a novel calibrated optimal decision rule (CODR) to address the limited outcome, by leveraging the shared pattern in multiple data sources. Under a mild and testable assumption that the conditional means of intermediate outcomes in different samples are equal given baseline covariates and the treatment information, we can show that the calibrated mean outcome of interest under the CODR is unbiased and more efficient than using the primary sample solely. Extensive experiments on simulated datasets demonstrate empirical validity and improvement of the proposed CODR, followed by a real application on the MIMIC-III as the primary sample with auxiliary data from eICU.

Related papers

Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy Paradigm [41.4789135538612]
This paper introduces a novel choice-based sample selection framework that shifts the focus from evaluating individual sample quality to comparing the contribution value of different samples. Thanks to the advanced language understanding capabilities of Large Language Models (LLMs), we utilize LLMs to evaluate the value of each option during the selection process.
arXiv Detail & Related papers (2025-03-04T07:32:41Z)
Federated Causal Inference: Multi-Centric ATE Estimation beyond Meta-Analysis [12.896319628045967]
We study Federated Causal Inference, an approach to estimate treatment effects from decentralized data across centers. We compare three classes of Average Treatment Effect (ATE) estimators derived from the Plug-in G-Formula.
arXiv Detail & Related papers (2024-10-22T10:19:17Z)
Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation [57.6797306341115]
We take an initial step towards understanding various matching-based DD methods from the perspective of sample difficulty. We then extend the neural scaling laws of data pruning to DD to theoretically explain these matching-based methods. We introduce the Sample Difficulty Correction (SDC) approach, designed to predominantly generate easier samples to achieve higher dataset quality.
arXiv Detail & Related papers (2024-08-22T15:20:32Z)
Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions. We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z)
Approximating Counterfactual Bounds while Fusing Observational, Biased and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies. We show that the likelihood of the available data has no local maxima. We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z)
Data thinning for convolution-closed distributions [2.299914829977005]
We propose data thinning, an approach for splitting an observation into two or more independent parts that sum to the original observation. We show that data thinning can be used to validate the results of unsupervised learning approaches.
arXiv Detail & Related papers (2023-01-18T02:47:41Z)
Optimal Condition Training for Target Source Separation [56.86138859538063]
We propose a new optimal condition training method for single-channel target source separation. We show that the complementary information carried by the diverse semantic concepts significantly helps to disentangle and isolate sources of interest.
arXiv Detail & Related papers (2022-11-11T00:04:55Z)
Rethinking Collaborative Metric Learning: Toward an Efficient Alternative without Negative Sampling [156.7248383178991]
Collaborative Metric Learning (CML) paradigm has aroused wide interest in the area of recommendation systems (RS) We find that negative sampling would lead to a biased estimation of the generalization error. Motivated by this, we propose an efficient alternative without negative sampling for CML named textitSampling-Free Collaborative Metric Learning (SFCML)
arXiv Detail & Related papers (2022-06-23T08:50:22Z)
Estimation of Local Average Treatment Effect by Data Combination [3.655021726150368]
It is important to estimate the local average treatment effect (LATE) when compliance with a treatment assignment is incomplete. Previously proposed methods for LATE estimation required all relevant variables to be jointly observed in a single dataset. We propose a weighted least squares estimator that enables simpler model selection by avoiding the minimax objective formulation.
arXiv Detail & Related papers (2021-09-11T03:51:48Z)
The Pitfalls of Sample Selection: A Case Study on Lung Nodule Classification [13.376247652484274]
In lung nodule classification, many works report results on the publicly available LIDC dataset. In theory, this should allow a direct comparison of the performance of proposed methods and assess the impact of individual contributions. We find that each employs a different data selection process, leading to largely varying total number of samples and ratios between benign and malignant cases. We show that specific choices can have severe impact on the data distribution where it may be possible to achieve superior performance on one sample distribution but not on another.
arXiv Detail & Related papers (2021-08-11T18:07:07Z)
GEAR: On Optimal Decision Making with Auxiliary Data [20.607673853640744]
Current optimal decision rule (ODR) methods usually require the primary outcome of interest in samples for assessing treatment effects, namely the experimental sample. This paper is inspired to address this challenge by making use of an auxiliary sample to facilitate the estimation of ODR in the experimental sample. We propose an auGmented inverse propensity weighted Experimental and Auxiliary sample-based decision Rule (GEAR) by maximizing the augmented inverse propensity weighted value estimator over a class of decision rules.
arXiv Detail & Related papers (2021-04-21T14:59:25Z)
Multi-Source Causal Inference Using Control Variates [81.57072928775509]
We propose a general algorithm to estimate causal effects from emphmultiple data sources. We show theoretically that this reduces the variance of the ATE estimate. We apply this framework to inference from observational data under an outcome selection bias.
arXiv Detail & Related papers (2021-03-30T21:20:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.