Related papers: Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms

Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms

URL: http://arxiv.org/abs/2406.01149v1
Date: Mon, 3 Jun 2024 09:43:24 GMT
Title: Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms
Authors: Avishek Ghosh, Arya Mazumdar,
Abstract summary: Mixed linear regression is a well-studied problem in statistics and machine learning. In this paper, we consider the more general problem of learning of mixed linear regression from samples. We show that the AM and EM algorithms lead to learning in mixed linear regression by converging to the population loss minimizers.
Score: 22.79595679373698
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mixed linear regression is a well-studied problem in parametric statistics and machine learning. Given a set of samples, tuples of covariates and labels, the task of mixed linear regression is to find a small list of linear relationships that best fit the samples. Usually it is assumed that the label is generated stochastically by randomly selecting one of two or more linear functions, applying this chosen function to the covariates, and potentially introducing noise to the result. In that situation, the objective is to estimate the ground-truth linear functions up to some parameter error. The popular expectation maximization (EM) and alternating minimization (AM) algorithms have been previously analyzed for this. In this paper, we consider the more general problem of agnostic learning of mixed linear regression from samples, without such generative models. In particular, we show that the AM and EM algorithms, under standard conditions of separability and good initialization, lead to agnostic learning in mixed linear regression by converging to the population loss minimizers, for suitably defined loss functions. In some sense, this shows the strength of AM and EM algorithms that converges to ``optimal solutions'' even in the absence of realizable generative models.

Related papers

Learning a Class of Mixed Linear Regressions: Global Convergence under General Data Conditions [1.9295130374196499]
Mixed linear regression (MLR) has attracted increasing attention because of its great theoretical and practical importance in nonlinear relationships by utilizing a mixture of linear regression sub-models. Although considerable efforts have been devoted to the learning problem of such systems, most existing investigations impose the strict independent and identically distributed (i.i.d.) or distributed PE conditions.
arXiv Detail & Related papers (2025-03-24T09:57:39Z)
Two-Point Deterministic Equivalence for Stochastic Gradient Dynamics in Linear Models [76.52307406752556]
We derive a novel deterministic equivalence for the two-point function of a random resolvent. We give a unified derivation of the performance of a wide variety of high-dimensional trained linear models with gradient descent.
arXiv Detail & Related papers (2025-02-07T16:45:40Z)
Global Convergence of Online Identification for Mixed Linear Regression [1.9295130374196499]
Mixed linear regression (MLR) is a powerful model for characterizing nonlinear relationships. This paper investigates the online identification and data clustering problems for two basic classes of MLRs. It introduces two corresponding new online identification algorithms based on the expectation-maximization principle.
arXiv Detail & Related papers (2023-11-30T12:30:42Z)
Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions. Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation. In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z)
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features. We find new and interesting properties that do not exist in single-task linear regression. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z)
Mixed Regression via Approximate Message Passing [16.91276351457051]
We study the problem of regression in a generalized linear model (GLM) with multiple signals and latent variables. In mixed linear regression, each observation comes from one of $L$ signal vectors (regressors), but we do not know which one. In max-affine regression, each observation comes from the maximum of $L$ affine functions, each defined via a different signal vector.
arXiv Detail & Related papers (2023-04-05T04:59:59Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
On Learning Mixture of Linear Regressions in the Non-Realizable Setting [44.307245411703704]
We show that mixture of linear regressions (MLR) can be used for prediction where instead of predicting a label, the model predicts a list of values. In this paper we show that a version of the popular minimization (AM) algorithm finds the best fit lines in a dataset even when a realizable model is not assumed.
arXiv Detail & Related papers (2022-05-26T05:34:57Z)
Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression. It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise. This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
Piecewise linear regression and classification [0.20305676256390928]
This paper proposes a method for solving multivariate regression and classification problems using piecewise linear predictors. A Python implementation of the algorithm described in this paper is available at http://cse.lab.imtlucca.it/bemporad/parc.
arXiv Detail & Related papers (2021-03-10T17:07:57Z)
Learning Mixtures of Low-Rank Models [89.39877968115833]
We study the problem of learning computational mixtures of low-rank models. We develop an algorithm that is guaranteed to recover the unknown matrices with near-optimal sample. In addition, the proposed algorithm is provably stable against random noise.
arXiv Detail & Related papers (2020-09-23T17:53:48Z)
A spectral algorithm for robust regression with subgaussian rates [0.0]
We study a new linear up to quadratic time algorithm for linear regression in the absence of strong assumptions on the underlying distributions of samples. The goal is to design a procedure which attains the optimal sub-gaussian error bound even though the data have only finite moments.
arXiv Detail & Related papers (2020-07-12T19:33:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.