EM for Mixture of Linear Regression with Clustered Data
- URL: http://arxiv.org/abs/2308.11518v1
- Date: Tue, 22 Aug 2023 15:47:58 GMT
- Title: EM for Mixture of Linear Regression with Clustered Data
- Authors: Amirhossein Reisizadeh, Khashayar Gatmiry, Asuman Ozdaglar
- Abstract summary: We discuss how the underlying clustered structures in distributed data can be exploited to improve learning schemes.
We employ the well-known Expectation-Maximization (EM) method to estimate the maximum likelihood parameters from $m$ batches of dependent samples.
We show that if properly, EM on the structured data requires only $O(1)$ to reach the same statistical accuracy, as long as $m$ grows up as $eo(n)$.
- Score: 6.948976192408852
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Modern data-driven and distributed learning frameworks deal with diverse
massive data generated by clients spread across heterogeneous environments.
Indeed, data heterogeneity is a major bottleneck in scaling up many distributed
learning paradigms. In many settings however, heterogeneous data may be
generated in clusters with shared structures, as is the case in several
applications such as federated learning where a common latent variable governs
the distribution of all the samples generated by a client. It is therefore
natural to ask how the underlying clustered structures in distributed data can
be exploited to improve learning schemes. In this paper, we tackle this
question in the special case of estimating $d$-dimensional parameters of a
two-component mixture of linear regressions problem where each of $m$ nodes
generates $n$ samples with a shared latent variable. We employ the well-known
Expectation-Maximization (EM) method to estimate the maximum likelihood
parameters from $m$ batches of dependent samples each containing $n$
measurements. Discarding the clustered structure in the mixture model, EM is
known to require $O(\log(mn/d))$ iterations to reach the statistical accuracy
of $O(\sqrt{d/(mn)})$. In contrast, we show that if initialized properly, EM on
the structured data requires only $O(1)$ iterations to reach the same
statistical accuracy, as long as $m$ grows up as $e^{o(n)}$. Our analysis
establishes and combines novel asymptotic optimization and generalization
guarantees for population and empirical EM with dependent samples, which may be
of independent interest.
Related papers
- Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Weighted Sparse Partial Least Squares for Joint Sample and Feature
Selection [7.219077740523681]
We propose an $ell_infty/ell_0$-norm constrained weighted sparse PLS ($ell_infty/ell_$-wsPLS) method for joint sample and feature selection.
We develop an efficient iterative algorithm for each multi-view wsPLS model and show its convergence property.
arXiv Detail & Related papers (2023-08-13T10:09:25Z) - Stochastic Approximation Approaches to Group Distributionally Robust
Optimization [96.26317627118912]
Group distributionally robust optimization (GDRO)
Online learning techniques to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample.
A novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - A distribution-free mixed-integer optimization approach to hierarchical modelling of clustered and longitudinal data [0.0]
We introduce an innovative algorithm that evaluates cluster effects for new data points, thereby increasing the robustness and precision of this model.
The inferential and predictive efficacy of this approach is further illustrated through its application in student scoring and protein expression.
arXiv Detail & Related papers (2023-02-06T23:34:51Z) - On-Demand Sampling: Learning Optimally from Multiple Distributions [63.20009081099896]
Social and real-world considerations have given rise to multi-distribution learning paradigms.
We establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity.
Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving zero-sum games.
arXiv Detail & Related papers (2022-10-22T19:07:26Z) - Convergence for score-based generative modeling with polynomial
complexity [9.953088581242845]
We prove the first convergence guarantees for the core mechanic behind Score-based generative modeling.
Compared to previous works, we do not incur error that grows exponentially in time or that suffers from a curse of dimensionality.
We show that a predictor-corrector gives better convergence than using either portion alone.
arXiv Detail & Related papers (2022-06-13T14:57:35Z) - A Statistical Learning View of Simple Kriging [0.0]
We analyze the simple Kriging task from a statistical learning perspective.
The goal is to predict the unknown values it takes at any other location with minimum quadratic risk.
We prove non-asymptotic bounds of order $O_mathbbP (1/sqrtn)$ for the excess risk of a plug-in predictive rule mimicking the true minimizer.
arXiv Detail & Related papers (2022-02-15T12:46:43Z) - Iterative Feature Matching: Toward Provable Domain Generalization with
Logarithmic Environments [55.24895403089543]
Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments.
We present a new algorithm based on performing iterative feature matching that is guaranteed with high probability to yield a predictor that generalizes after seeing only $O(logd_s)$ environments.
arXiv Detail & Related papers (2021-06-18T04:39:19Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - Outlier-Robust Clustering of Non-Spherical Mixtures [5.863264019032882]
We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs)
Our results extend to clustering mixtures of arbitrary affine transforms of the uniform distribution on the $d$-dimensional unit sphere.
arXiv Detail & Related papers (2020-05-06T17:24:27Z) - Algebraic and Analytic Approaches for Parameter Learning in Mixture
Models [66.96778152993858]
We present two different approaches for parameter learning in several mixture models in one dimension.
For some of these distributions, our results represent the first guarantees for parameter estimation.
arXiv Detail & Related papers (2020-01-19T05:10:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.