Related papers: Communication-Efficient Distributed Estimation and Inference for Cox's Model

Communication-Efficient Distributed Estimation and Inference for Cox's Model

URL: http://arxiv.org/abs/2302.12111v3
Date: Sun, 23 Jun 2024 17:38:15 GMT
Title: Communication-Efficient Distributed Estimation and Inference for Cox's Model
Authors: Pierre Bayle, Jianqing Fan, Zhipeng Lou,
Abstract summary: We develop communication-efficient iterative distributed algorithms for estimation and inference in the high-dimensional sparse Cox proportional hazards model. To construct confidence intervals for linear combinations of high-dimensional hazard regression coefficients, we introduce a novel debiased method. We provide valid and powerful distributed hypothesis tests for any coordinate element based on a decorrelated score test.
Score: 4.731404257629232
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivated by multi-center biomedical studies that cannot share individual data due to privacy and ownership concerns, we develop communication-efficient iterative distributed algorithms for estimation and inference in the high-dimensional sparse Cox proportional hazards model. We demonstrate that our estimator, even with a relatively small number of iterations, achieves the same convergence rate as the ideal full-sample estimator under very mild conditions. To construct confidence intervals for linear combinations of high-dimensional hazard regression coefficients, we introduce a novel debiased method, establish central limit theorems, and provide consistent variance estimators that yield asymptotically valid distributed confidence intervals. In addition, we provide valid and powerful distributed hypothesis tests for any coordinate element based on a decorrelated score test. We allow time-dependent covariates as well as censored survival times. Extensive numerical experiments on both simulated and real data lend further support to our theory and demonstrate that our communication-efficient distributed estimators, confidence intervals, and hypothesis tests improve upon alternative methods.

Related papers

In-Context Parametric Inference: Point or Distribution Estimators? [66.22308335324239]
We show that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems. Our experiments indicate that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems.
arXiv Detail & Related papers (2025-02-17T10:00:24Z)
Statistical Inference in Tensor Completion: Optimal Uncertainty Quantification and Statistical-to-Computational Gaps [7.174572371800217]
This paper presents a simple yet efficient method for statistical inference of tensor linear forms using incomplete and noisy observations. It is suitable for various statistical inference tasks, including constructing confidence intervals, inference under heteroskedastic and sub-exponential noise, and simultaneous testing.
arXiv Detail & Related papers (2024-10-15T03:09:52Z)
Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations. We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting. We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z)
Discriminative Estimation of Total Variation Distance: A Fidelity Auditor for Generative Data [10.678533056953784]
We propose a discriminative approach to estimate the total variation (TV) distance between two distributions. Our method quantitatively characterizes the relation between the Bayes risk in classifying two distributions and their TV distance. We demonstrate that, with a specific choice of hypothesis class in classification, a fast convergence rate in estimating the TV distance can be achieved.
arXiv Detail & Related papers (2024-05-24T08:18:09Z)
Model Free Prediction with Uncertainty Assessment [7.524024486998338]
We propose a novel framework that transforms the deep estimation paradigm into a platform conducive to conditional mean estimation. We develop an end-to-end convergence rate for the conditional diffusion model and establish the normality of the generated samples. Through numerical experiments, we empirically validate the efficacy of our proposed methodology.
arXiv Detail & Related papers (2024-05-21T11:19:50Z)
Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data. Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z)
Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated. We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z)
Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner. We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z)
Statistical Inference for High-Dimensional Linear Regression with Blockwise Missing Data [13.48481978963297]
Blockwise missing data occurs when we integrate multisource or multimodality data where different sources or modalities contain complementary information. We propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations. Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.
arXiv Detail & Related papers (2021-06-07T05:12:42Z)
DEMI: Discriminative Estimator of Mutual Information [5.248805627195347]
Estimating mutual information between continuous random variables is often intractable and challenging for high-dimensional data. Recent progress has leveraged neural networks to optimize variational lower bounds on mutual information. Our approach is based on training a classifier that provides the probability that a data sample pair is drawn from the joint distribution.
arXiv Detail & Related papers (2020-10-05T04:19:27Z)
Nonparametric Score Estimators [49.42469547970041]
Estimating the score from a set of samples generated by an unknown distribution is a fundamental task in inference and learning of probabilistic models. We provide a unifying view of these estimators under the framework of regularized nonparametric regression. We propose score estimators based on iterative regularization that enjoy computational benefits from curl-free kernels and fast convergence.
arXiv Detail & Related papers (2020-05-20T15:01:03Z)
Machine learning for causal inference: on the use of cross-fit estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties. We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE) When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.