Quasi-Newton Updating for Large-Scale Distributed Learning
- URL: http://arxiv.org/abs/2306.04111v2
- Date: Sun, 11 Jun 2023 14:06:40 GMT
- Title: Quasi-Newton Updating for Large-Scale Distributed Learning
- Authors: Shuyuan Wu, Danyang Huang, Hansheng Wang
- Abstract summary: We develop a distributed quasi-Newton (DQN) framework with excellent statistical, computation, and communication efficiency.
No Hessian matrix inversion or communication is needed in the DQN method.
We theoretically demonstrate that the resulting estimator is statistically efficient over a small number of iterations under mild conditions.
- Score: 0.32228025627337864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributed computing is critically important for modern statistical
analysis. Herein, we develop a distributed quasi-Newton (DQN) framework with
excellent statistical, computation, and communication efficiency. In the DQN
method, no Hessian matrix inversion or communication is needed. This
considerably reduces the computation and communication complexity of the
proposed method. Notably, related existing methods only analyze numerical
convergence and require a diverging number of iterations to converge. However,
we investigate the statistical properties of the DQN method and theoretically
demonstrate that the resulting estimator is statistically efficient over a
small number of iterations under mild conditions. Extensive numerical analyses
demonstrate the finite sample performance.
Related papers
- Statistical Inference in Tensor Completion: Optimal Uncertainty Quantification and Statistical-to-Computational Gaps [7.174572371800217]
This paper presents a simple yet efficient method for statistical inference of tensor linear forms using incomplete and noisy observations.
It is suitable for various statistical inference tasks, including constructing confidence intervals, inference under heteroskedastic and sub-exponential noise, and simultaneous testing.
arXiv Detail & Related papers (2024-10-15T03:09:52Z) - Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems.
We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Large Dimensional Independent Component Analysis: Statistical Optimality
and Computational Tractability [13.104413212606577]
We investigate the optimal statistical performance and the impact of computational constraints for independent component analysis (ICA)
We show that the optimal sample complexity is linear in dimensionality.
We develop computationally tractable estimates that attain both the optimal sample complexity and minimax optimal rates of convergence.
arXiv Detail & Related papers (2023-03-31T15:46:30Z) - High Dimensional Statistical Estimation under One-bit Quantization [27.718986773043643]
One-bit (binary) data are preferable in many applications because of the efficiency in signal storage, processing, transmission, and enhancement of privacy.
In this paper, we study three fundamental statistical estimation problems.
Under both sub-Gaussian and heavy-tailed regimes, new estimators that handle high-dimensional scaling are proposed.
arXiv Detail & Related papers (2022-02-26T15:13:04Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - Neural Estimation of Statistical Divergences [24.78742908726579]
A modern method for estimating statistical divergences relies on parametrizing an empirical variational form by a neural network (NN)
In particular, there is a fundamental tradeoff between the two sources of error involved: approximation and empirical estimation.
We show that neural estimators with a slightly different NN growth-rate are near minimax rate-optimal, achieving the parametric convergence rate up to logarithmic factors.
arXiv Detail & Related papers (2021-10-07T17:42:44Z) - Quantum Algorithms for Data Representation and Analysis [68.754953879193]
We provide quantum procedures that speed-up the solution of eigenproblems for data representation in machine learning.
The power and practical use of these subroutines is shown through new quantum algorithms, sublinear in the input matrix's size, for principal component analysis, correspondence analysis, and latent semantic analysis.
Results show that the run-time parameters that do not depend on the input's size are reasonable and that the error on the computed model is small, allowing for competitive classification performances.
arXiv Detail & Related papers (2021-04-19T00:41:43Z) - Stochastic Approximation for Online Tensorial Independent Component
Analysis [98.34292831923335]
Independent component analysis (ICA) has been a popular dimension reduction tool in statistical machine learning and signal processing.
In this paper, we present a by-product online tensorial algorithm that estimates for each independent component.
arXiv Detail & Related papers (2020-12-28T18:52:37Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.