Fair Sparse Regression with Clustering: An Invex Relaxation for a
Combinatorial Problem
- URL: http://arxiv.org/abs/2102.09704v1
- Date: Fri, 19 Feb 2021 01:46:34 GMT
- Title: Fair Sparse Regression with Clustering: An Invex Relaxation for a
Combinatorial Problem
- Authors: Adarsh Barik and Jean Honorio
- Abstract summary: We show that the inclusion of the debiasing/fairness constraint in our model has no adverse effect on the performance.
We simultaneously solve the clustering problem by recovering the exact value of the hidden attribute for each sample.
- Score: 32.18449686637963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the problem of fair sparse regression on a biased
dataset where bias depends upon a hidden binary attribute. The presence of a
hidden attribute adds an extra layer of complexity to the problem by combining
sparse regression and clustering with unknown binary labels. The corresponding
optimization problem is combinatorial but we propose a novel relaxation of it
as an \emph{invex} optimization problem. To the best of our knowledge, this is
the first invex relaxation for a combinatorial problem. We show that the
inclusion of the debiasing/fairness constraint in our model has no adverse
effect on the performance. Rather, it enables the recovery of the hidden
attribute. The support of our recovered regression parameter vector matches
exactly with the true parameter vector. Moreover, we simultaneously solve the
clustering problem by recovering the exact value of the hidden attribute for
each sample. Our method uses carefully constructed primal dual witnesses to
solve the combinatorial problem. We provide theoretical guarantees which hold
as long as the number of samples is polynomial in terms of the dimension of the
regression parameter vector.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Stability-Adjusted Cross-Validation for Sparse Linear Regression [5.156484100374059]
Cross-validation techniques like k-fold cross-validation substantially increase the computational cost of sparse regression.
We propose selecting hyper parameters that minimize a weighted sum of a cross-validation metric and a model's output stability.
Our confidence adjustment procedure reduces test set error by 2%, on average, on 13 real-world datasets.
arXiv Detail & Related papers (2023-06-26T17:02:45Z) - Shuffled linear regression through graduated convex relaxation [12.614901374282868]
The shuffled linear regression problem aims to recover linear relationships in datasets where the correspondence between input and output is unknown.
This problem arises in a wide range of applications including survey data.
We propose a novel optimization algorithm for shuffled linear regression based on a posterior-maximizing objective function.
arXiv Detail & Related papers (2022-09-30T17:33:48Z) - Sparse Mixed Linear Regression with Guarantees: Taming an Intractable
Problem with Invex Relaxation [31.061339148448006]
In this paper, we study the problem of sparse mixed linear regression on an unlabeled dataset.
We provide a novel invex relaxation for this intractable problem which leads to a solution with provable theoretical guarantees.
arXiv Detail & Related papers (2022-06-02T17:38:11Z) - On Learning Mixture Models with Sparse Parameters [44.3425205248937]
We study mixtures with high dimensional sparse latent parameter vectors and consider the problem of support recovery of those vectors.
We provide efficient algorithms for support recovery that have a logarithmic sample complexity dependence on the dimensionality of the latent space.
arXiv Detail & Related papers (2022-02-24T07:44:23Z) - Sparse Quadratic Optimisation over the Stiefel Manifold with Application
to Permutation Synchronisation [71.27989298860481]
We address the non- optimisation problem of finding a matrix on the Stiefel manifold that maximises a quadratic objective function.
We propose a simple yet effective sparsity-promoting algorithm for finding the dominant eigenspace matrix.
arXiv Detail & Related papers (2021-09-30T19:17:35Z) - Solving weakly supervised regression problem using low-rank manifold
regularization [77.34726150561087]
We solve a weakly supervised regression problem.
Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources.
In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Optimal Feature Manipulation Attacks Against Linear Regression [64.54500628124511]
In this paper, we investigate how to manipulate the coefficients obtained via linear regression by adding carefully designed poisoning data points to the dataset or modify the original data points.
Given the energy budget, we first provide the closed-form solution of the optimal poisoning data point when our target is modifying one designated regression coefficient.
We then extend the analysis to the more challenging scenario where the attacker aims to change one particular regression coefficient while making others to be changed as small as possible.
arXiv Detail & Related papers (2020-02-29T04:26:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.