Sparse Mixed Linear Regression with Guarantees: Taming an Intractable
Problem with Invex Relaxation
- URL: http://arxiv.org/abs/2206.01167v1
- Date: Thu, 2 Jun 2022 17:38:11 GMT
- Title: Sparse Mixed Linear Regression with Guarantees: Taming an Intractable
Problem with Invex Relaxation
- Authors: Adarsh Barik, Jean Honorio
- Abstract summary: In this paper, we study the problem of sparse mixed linear regression on an unlabeled dataset.
We provide a novel invex relaxation for this intractable problem which leads to a solution with provable theoretical guarantees.
- Score: 31.061339148448006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the problem of sparse mixed linear regression on an
unlabeled dataset that is generated from linear measurements from two different
regression parameter vectors. Since the data is unlabeled, our task is not only
to figure out a good approximation of the regression parameter vectors but also
to label the dataset correctly. In its original form, this problem is NP-hard.
The most popular algorithms to solve this problem (such as
Expectation-Maximization) have a tendency to stuck at local minima. We provide
a novel invex relaxation for this intractable problem which leads to a solution
with provable theoretical guarantees. This relaxation enables exact recovery of
data labels. Furthermore, we recover a close approximation of the regression
parameter vectors which match the true parameter vectors in support and sign.
Our formulation uses a carefully constructed primal dual witnesses framework
for the invex problem. Furthermore, we show that the sample complexity of our
method is only logarithmic in terms of the dimension of the regression
parameter vectors.
Related papers
- Shuffled linear regression through graduated convex relaxation [12.614901374282868]
The shuffled linear regression problem aims to recover linear relationships in datasets where the correspondence between input and output is unknown.
This problem arises in a wide range of applications including survey data.
We propose a novel optimization algorithm for shuffled linear regression based on a posterior-maximizing objective function.
arXiv Detail & Related papers (2022-09-30T17:33:48Z) - Hardness and Algorithms for Robust and Sparse Optimization [17.842787715567436]
We explore algorithms and limitations for sparse optimization problems such as sparse linear regression and robust linear regression.
Specifically, the sparse linear regression problem seeks a $k$-sparse vector $xinmathbbRd$ to minimize $|Ax-b|$.
The robust linear regression problem seeks a set $S$ that ignores at most $k$ rows and a vector $x$ to minimize $|(Ax-b)_S|$.
arXiv Detail & Related papers (2022-06-29T01:40:38Z) - Statistical Query Lower Bounds for List-Decodable Linear Regression [55.06171096484622]
We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples.
Our main result is a Statistical Query (SQ) lower bound of $dmathrmpoly (1/alpha)$ for this problem.
arXiv Detail & Related papers (2021-06-17T17:45:21Z) - A Precise Performance Analysis of Support Vector Regression [105.94855998235232]
We study the hard and soft support vector regression techniques applied to a set of $n$ linear measurements.
Our results are then used to optimally tune the parameters intervening in the design of hard and soft support vector regression algorithms.
arXiv Detail & Related papers (2021-05-21T14:26:28Z) - Solving weakly supervised regression problem using low-rank manifold
regularization [77.34726150561087]
We solve a weakly supervised regression problem.
Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources.
In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z) - Fair Sparse Regression with Clustering: An Invex Relaxation for a
Combinatorial Problem [32.18449686637963]
We show that the inclusion of the debiasing/fairness constraint in our model has no adverse effect on the performance.
We simultaneously solve the clustering problem by recovering the exact value of the hidden attribute for each sample.
arXiv Detail & Related papers (2021-02-19T01:46:34Z) - A Hypergradient Approach to Robust Regression without Correspondence [85.49775273716503]
We consider a variant of regression problem, where the correspondence between input and output data is not available.
Most existing methods are only applicable when the sample size is small.
We propose a new computational framework -- ROBOT -- for the shuffled regression problem.
arXiv Detail & Related papers (2020-11-30T21:47:38Z) - A spectral algorithm for robust regression with subgaussian rates [0.0]
We study a new linear up to quadratic time algorithm for linear regression in the absence of strong assumptions on the underlying distributions of samples.
The goal is to design a procedure which attains the optimal sub-gaussian error bound even though the data have only finite moments.
arXiv Detail & Related papers (2020-07-12T19:33:50Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z) - Optimal Feature Manipulation Attacks Against Linear Regression [64.54500628124511]
In this paper, we investigate how to manipulate the coefficients obtained via linear regression by adding carefully designed poisoning data points to the dataset or modify the original data points.
Given the energy budget, we first provide the closed-form solution of the optimal poisoning data point when our target is modifying one designated regression coefficient.
We then extend the analysis to the more challenging scenario where the attacker aims to change one particular regression coefficient while making others to be changed as small as possible.
arXiv Detail & Related papers (2020-02-29T04:26:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.