Dual-sPLS: a family of Dual Sparse Partial Least Squares regressions for
feature selection and prediction with tunable sparsity; evaluation on
simulated and near-infrared (NIR) data
- URL: http://arxiv.org/abs/2301.07206v1
- Date: Tue, 17 Jan 2023 21:50:35 GMT
- Title: Dual-sPLS: a family of Dual Sparse Partial Least Squares regressions for
feature selection and prediction with tunable sparsity; evaluation on
simulated and near-infrared (NIR) data
- Authors: Louna Alsouki and Laurent Duval and Cl\'ement Marteau and Rami El
Haddad and Fran\c{c}ois Wahl
- Abstract summary: The variant presented in this paper, Dual-sPLS, generalizes the classical PLS1 algorithm.
It provides balance between accurate prediction and efficient interpretation.
Code is provided as an open-source package in R.
- Score: 1.6099403809839032
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Relating a set of variables X to a response y is crucial in chemometrics. A
quantitative prediction objective can be enriched by qualitative data
interpretation, for instance by locating the most influential features. When
high-dimensional problems arise, dimension reduction techniques can be used.
Most notable are projections (e.g. Partial Least Squares or PLS ) or variable
selections (e.g. lasso). Sparse partial least squares combine both strategies,
by blending variable selection into PLS. The variant presented in this paper,
Dual-sPLS, generalizes the classical PLS1 algorithm. It provides balance
between accurate prediction and efficient interpretation. It is based on
penalizations inspired by classical regression methods (lasso, group lasso,
least squares, ridge) and uses the dual norm notion. The resulting sparsity is
enforced by an intuitive shrinking ratio parameter. Dual-sPLS favorably
compares to similar regression methods, on simulated and real chemical data.
Code is provided as an open-source package in R:
\url{https://CRAN.R-project.org/package=dual.spls}.
Related papers
- Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - Spectrum-Aware Debiasing: A Modern Inference Framework with Applications to Principal Components Regression [1.342834401139078]
We introduce SpectrumAware Debiasing, a novel method for high-dimensional regression.
Our approach applies to problems with structured, heavy tails, and low-rank structures.
We demonstrate our method through simulated and real data experiments.
arXiv Detail & Related papers (2023-09-14T15:58:30Z) - Scalable Neural Symbolic Regression using Control Variables [7.725394912527969]
We propose ScaleSR, a scalable symbolic regression model that leverages control variables to enhance both accuracy and scalability.
The proposed method involves a four-step process. First, we learn a data generator from observed data using deep neural networks (DNNs)
Experimental results demonstrate that the proposed ScaleSR significantly outperforms state-of-the-art baselines in discovering mathematical expressions with multiple variables.
arXiv Detail & Related papers (2023-06-07T18:30:25Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Partial Least Square Regression via Three-factor SVD-type Manifold
Optimization for EEG Decoding [4.0204191666595595]
We propose a new method to solve the partial least square regression, named PLSR via optimization on bi-Grassmann manifold (PLSRbiGr)
qlPLSRbiGr is validated with a variety of experiments for decoding EEG signals at motor imagery (MI) and steady-state visual evoked potential (SSVEP) task.
arXiv Detail & Related papers (2022-08-09T11:57:02Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - The MELODIC family for simultaneous binary logistic regression in a
reduced space [0.5330240017302619]
We propose the MELODIC family for simultaneous binary logistic regression modeling.
The model may be interpreted in terms of logistic regression coefficients or in terms of a biplot.
Two applications are shown in detail: one relating personality characteristics to drug consumption profiles and one relating personality characteristics to depressive and anxiety disorders.
arXiv Detail & Related papers (2021-02-16T15:47:20Z) - Nonlinear Distribution Regression for Remote Sensing Applications [6.664736150040092]
In many remote sensing applications one wants to estimate variables or parameters of interest from observations.
Standard algorithms such as neural networks, random forests or Gaussian processes are readily available to relate to the two.
This paper introduces a nonlinear (kernel-based) method for distribution regression that solves the previous problems without making any assumption on the statistics of the grouped data.
arXiv Detail & Related papers (2020-12-07T22:04:43Z) - Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data.
We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z) - Optimal Feature Manipulation Attacks Against Linear Regression [64.54500628124511]
In this paper, we investigate how to manipulate the coefficients obtained via linear regression by adding carefully designed poisoning data points to the dataset or modify the original data points.
Given the energy budget, we first provide the closed-form solution of the optimal poisoning data point when our target is modifying one designated regression coefficient.
We then extend the analysis to the more challenging scenario where the attacker aims to change one particular regression coefficient while making others to be changed as small as possible.
arXiv Detail & Related papers (2020-02-29T04:26:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.