Related papers: Sparse Linear Centroid-Encoder: A Convex Method for Feature Selection

Sparse Linear Centroid-Encoder: A Convex Method for Feature Selection

URL: http://arxiv.org/abs/2306.04824v2
Date: Fri, 9 Jun 2023 04:06:36 GMT
Title: Sparse Linear Centroid-Encoder: A Convex Method for Feature Selection
Authors: Tomojit Ghosh, Michael Kirby, Karim Karimov
Abstract summary: We present Sparse Centroid-Encoder (SLCE) over a novel feature selection technique. The algorithm uses a linear network to reconstruct a neural feature at the same time.
Score: 1.057079240576682
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a novel feature selection technique, Sparse Linear Centroid-Encoder (SLCE). The algorithm uses a linear transformation to reconstruct a point as its class centroid and, at the same time, uses the $\ell_1$-norm penalty to filter out unnecessary features from the input data. The original formulation of the optimization problem is nonconvex, but we propose a two-step approach, where each step is convex. In the first step, we solve the linear Centroid-Encoder, a convex optimization problem over a matrix $A$. In the second step, we only search for a sparse solution over a diagonal matrix $B$ while keeping $A$ fixed. Unlike other linear methods, e.g., Sparse Support Vector Machines and Lasso, Sparse Linear Centroid-Encoder uses a single model for multi-class data. We present an in-depth empirical analysis of the proposed model and show that it promotes sparsity on various data sets, including high-dimensional biological data. Our experimental results show that SLCE has a performance advantage over some state-of-the-art neural network-based feature selection techniques.

Related papers

Data Selection for ERMs [67.57726352698933]
We study how well can $mathcalA$ perform when trained on at most $nll N$ data points selected from a population of $N$ points. Our results include optimal data-selection bounds for mean estimation, linear classification, and linear regression.
arXiv Detail & Related papers (2025-04-20T11:26:01Z)
A Convex formulation for linear discriminant analysis [1.3124513975412255]
We present a supervised dimensionality reduction technique called Convex Linear Discriminant Analysis (ConvexLDA) We show that ConvexLDA outperforms several popular linear discriminant analysis (LDA)-based methods on a range of high-dimensional biological data, image data sets, etc.
arXiv Detail & Related papers (2025-03-17T18:17:49Z)
An Iterative Bayesian Approach for System Identification based on Linear Gaussian Models [86.05414211113627]
We tackle the problem of system identification, where we select inputs, observe the corresponding outputs from the true system, and optimize the parameters of our model to best fit the data. We propose a flexible and computationally tractable methodology that is compatible with any system and parametric family of models.
arXiv Detail & Related papers (2025-01-28T01:57:51Z)
Feature Selection using Sparse Adaptive Bottleneck Centroid-Encoder [1.2487990897680423]
We introduce a novel nonlinear model, Sparse Adaptive Bottleneckid-Encoder (SABCE), for determining the features that discriminate between two or more classes. The algorithm is applied to various real-world data sets, including high-dimensional biological, image, speech, and accelerometer sensor data.
arXiv Detail & Related papers (2023-06-07T21:37:21Z)
Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time [54.65688986250061]
We give an input sparsity time sampling algorithm for approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices. Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time.
arXiv Detail & Related papers (2022-02-09T15:26:03Z)
Sparse Centroid-Encoder: A Nonlinear Model for Feature Selection [1.2487990897680423]
We develop a sparse implementation of the centroid-encoder for nonlinear data reduction and visualization called Centro Sparseid-Encoder. We also provide a feature selection framework that first ranks each feature by its occurrence, and the optimal number of features is chosen using a validation set. The algorithm is applied to a wide variety of data sets including, single-cell biological data, high dimensional infectious disease data, hyperspectral data, image data, and speech data.
arXiv Detail & Related papers (2022-01-30T20:46:24Z)
Nearly Optimal Linear Convergence of Stochastic Primal-Dual Methods for Linear Programming [5.126924253766052]
We show that the proposed method exhibits a linear convergence rate for solving sharp instances with a high probability. We also propose an efficient coordinate-based oracle for unconstrained bilinear problems.
arXiv Detail & Related papers (2021-11-10T04:56:38Z)
Sparse Quadratic Optimisation over the Stiefel Manifold with Application to Permutation Synchronisation [71.27989298860481]
We address the non- optimisation problem of finding a matrix on the Stiefel manifold that maximises a quadratic objective function. We propose a simple yet effective sparsity-promoting algorithm for finding the dominant eigenspace matrix.
arXiv Detail & Related papers (2021-09-30T19:17:35Z)
Hybrid Trilinear and Bilinear Programming for Aligning Partially Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm. We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound. We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z)
Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization. We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)
On the Adversarial Robustness of LASSO Based Feature Selection [72.54211869067979]
In the considered model, there is a malicious adversary who can observe the whole dataset, and then will carefully modify the response values or the feature matrix. We formulate the modification strategy of the adversary as a bi-level optimization problem. Numerical examples with synthetic and real data illustrate that our method is efficient and effective.
arXiv Detail & Related papers (2020-10-20T05:51:26Z)
Robust Multi-class Feature Selection via $l_{2,0}$-Norm Regularization Minimization [6.41804410246642]
Feature selection is an important computational-processing in data mining and machine learning. In this paper, a novel method based on homoy hard threshold (HIHT) is proposed to solve the least square problem for multi-class feature selection.
arXiv Detail & Related papers (2020-10-08T02:06:06Z)
Learning nonlinear dynamical systems from a single trajectory [102.60042167341956]
We introduce algorithms for learning nonlinear dynamical systems of the form $x_t+1=sigma(Thetastarx_t)+varepsilon_t$. We give an algorithm that recovers the weight matrix $Thetastar$ from a single trajectory with optimal sample complexity and linear running time.
arXiv Detail & Related papers (2020-04-30T10:42:48Z)
Efficient Algorithms for Multidimensional Segmented Regression [42.046881924063044]
We study the fundamental problem of fixed design em multidimensional regression. We provide the first sample and computationally efficient algorithm for this problem in any fixed dimension. Our algorithm relies on a simple merging iterative approach, which is novel in the multidimensional setting.
arXiv Detail & Related papers (2020-03-24T19:39:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.