Variance Reduced Median-of-Means Estimator for Byzantine-Robust
Distributed Inference
- URL: http://arxiv.org/abs/2103.02860v1
- Date: Thu, 4 Mar 2021 06:50:52 GMT
- Title: Variance Reduced Median-of-Means Estimator for Byzantine-Robust
Distributed Inference
- Authors: Jiyuan Tu, Weidong Liu, Xiaojun Mao, and Xi Chen
- Abstract summary: This paper develops an efficient distributed inference algorithm, which is robust against a moderate fraction of Byzantine nodes.
To the best of our knowledge, this is the first normality result in the setting of Byzantine-robust distributed learning.
- Score: 9.983824799552774
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper develops an efficient distributed inference algorithm, which is
robust against a moderate fraction of Byzantine nodes, namely arbitrary and
possibly adversarial machines in a distributed learning system. In robust
statistics, the median-of-means (MOM) has been a popular approach to hedge
against Byzantine failures due to its ease of implementation and computational
efficiency. However, the MOM estimator has the shortcoming in terms of
statistical efficiency. The first main contribution of the paper is to propose
a variance reduced median-of-means (VRMOM) estimator, which improves the
statistical efficiency over the vanilla MOM estimator and is computationally as
efficient as the MOM. Based on the proposed VRMOM estimator, we develop a
general distributed inference algorithm that is robust against Byzantine
failures. Theoretically, our distributed algorithm achieves a fast convergence
rate with only a constant number of rounds of communications. We also provide
the asymptotic normality result for the purpose of statistical inference. To
the best of our knowledge, this is the first normality result in the setting of
Byzantine-robust distributed learning. The simulation results are also
presented to illustrate the effectiveness of our method.
Related papers
- Beyond Sin-Squared Error: Linear-Time Entrywise Uncertainty Quantification for Streaming PCA [5.749787074942513]
We propose a novel statistical inference framework for streaming principal component analysis (PCA) using Oja's algorithm.<n>We derive a sharp Bernstein-type concentration bound for elements of the estimated vector matching the optimal error rate up to logarithmic factors.<n>To efficiently estimate the coordinate-wise variance, we introduce a provably consistent subsampling algorithm.
arXiv Detail & Related papers (2025-06-14T22:50:54Z) - Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings [24.07815507403025]
Estimating the distribution of outcomes under counterfactual policies is critical for decision-making in domains such as recommendation, advertising, and healthcare.<n>We analyze a novel framework-Counterfactual Policy Mean Embedding (CPME)-that represents the entire counterfactual outcome distribution in a reproducing kernel Hilbert space.
arXiv Detail & Related papers (2025-06-03T12:16:46Z) - Robust Estimation for Kernel Exponential Families with Smoothed Total Variation Distances [2.317910166616341]
In statistical inference, we commonly assume that samples are independent and identically distributed from a probability distribution.
In this paper, we explore the application of GAN-like estimators to a general class of statistical models.
arXiv Detail & Related papers (2024-10-28T05:50:47Z) - Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Scalable Bayesian inference for the generalized linear mixed model [2.45365913654612]
We introduce a statistical inference algorithm at the intersection of AI and Bayesian inference.
Our algorithm is an extension of gradient MCMC with novel contributions that address the treatment of correlated data.
We apply our algorithm to a large electronic health records database.
arXiv Detail & Related papers (2024-03-05T14:35:34Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - MFABA: A More Faithful and Accelerated Boundary-based Attribution Method
for Deep Neural Networks [69.28125286491502]
We introduce MFABA, an attribution algorithm that adheres to axioms.
Results demonstrate its superiority by achieving over 101.5142 times faster speed than the state-of-the-art attribution algorithms.
arXiv Detail & Related papers (2023-12-21T07:48:15Z) - Distributed Semi-Supervised Sparse Statistical Inference [6.685997976921953]
A debiased estimator is a crucial tool in statistical inference for high-dimensional model parameters.
Traditional methods require computing a debiased estimator on every machine.
An efficient multi-round distributed debiased estimator, which integrates both labeled and unlabelled data, is developed.
arXiv Detail & Related papers (2023-06-17T17:30:43Z) - Fast and Robust Sparsity Learning over Networks: A Decentralized
Surrogate Median Regression Approach [10.850336820582678]
We propose a decentralized surrogate median regression (deSMR) method for efficiently solving the decentralized sparsity learning problem.
Our proposed algorithm enjoys a linear convergence rate with a simple implementation.
We also establish the theoretical results for sparse support recovery.
arXiv Detail & Related papers (2022-02-11T08:16:01Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Tight Mutual Information Estimation With Contrastive Fenchel-Legendre
Optimization [69.07420650261649]
We introduce a novel, simple, and powerful contrastive MI estimator named as FLO.
Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently.
The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
arXiv Detail & Related papers (2021-07-02T15:20:41Z) - Communication-efficient Byzantine-robust distributed learning with
statistical guarantee [2.8407238889624877]
Communication efficiency and robustness are two major issues in modern distributed learning framework.
This paper develops two communication-efficient and robust distributed learning algorithms for convex problems.
arXiv Detail & Related papers (2021-02-28T01:38:37Z) - Estimating Average Treatment Effects with Support Vector Machines [77.34726150561087]
Support vector machine (SVM) is one of the most popular classification algorithms in the machine learning literature.
We adapt SVM as a kernel-based weighting procedure that minimizes the maximum mean discrepancy between the treatment and control groups.
We characterize the bias of causal effect estimation arising from this trade-off, connecting the proposed SVM procedure to the existing kernel balancing methods.
arXiv Detail & Related papers (2021-02-23T20:22:56Z) - Efficient semidefinite-programming-based inference for binary and
multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF.
We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.