Globally Convergent Accelerated Algorithms for Multilinear Sparse
Logistic Regression with $\ell_0$-constraints
- URL: http://arxiv.org/abs/2309.09239v1
- Date: Sun, 17 Sep 2023 11:05:08 GMT
- Title: Globally Convergent Accelerated Algorithms for Multilinear Sparse
Logistic Regression with $\ell_0$-constraints
- Authors: Weifeng Yang and Wenwen Min
- Abstract summary: Multilinear logistic regression serves as a powerful tool for the analysis of multidimensional data.
We propose an Accelerated Proximal Alternating Minim-MLSR model to solve the $ell_0$-MLSR.
We also demonstrate that APALM$+$ is globally convergent to a first-order critical point as well as to establish convergence by using the Kurdy-Lojasiewicz property.
- Score: 2.323238724742687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tensor data represents a multidimensional array. Regression methods based on
low-rank tensor decomposition leverage structural information to reduce the
parameter count. Multilinear logistic regression serves as a powerful tool for
the analysis of multidimensional data. To improve its efficacy and
interpretability, we present a Multilinear Sparse Logistic Regression model
with $\ell_0$-constraints ($\ell_0$-MLSR). In contrast to the $\ell_1$-norm and
$\ell_2$-norm, the $\ell_0$-norm constraint is better suited for feature
selection. However, due to its nonconvex and nonsmooth properties, solving it
is challenging and convergence guarantees are lacking. Additionally, the
multilinear operation in $\ell_0$-MLSR also brings non-convexity. To tackle
these challenges, we propose an Accelerated Proximal Alternating Linearized
Minimization with Adaptive Momentum (APALM$^+$) method to solve the
$\ell_0$-MLSR model. We provide a proof that APALM$^+$ can ensure the
convergence of the objective function of $\ell_0$-MLSR. We also demonstrate
that APALM$^+$ is globally convergent to a first-order critical point as well
as establish convergence rate by using the Kurdyka-Lojasiewicz property.
Empirical results obtained from synthetic and real-world datasets validate the
superior performance of our algorithm in terms of both accuracy and speed
compared to other state-of-the-art methods.
Related papers
- Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data [17.657917523817243]
We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional optimization problem.
In the context of least-squares instrumental variable regression, our algorithms neither require matrix inversions nor mini-batches.
We derive rates of convergence in expectation, that are of order $mathcalO(log T/T)$ and $mathcalO (1/T1-iota)$ for any $iota>0$.
arXiv Detail & Related papers (2024-05-29T19:21:55Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - Single Trajectory Nonparametric Learning of Nonlinear Dynamics [8.438421942654292]
Given a single trajectory of a dynamical system, we analyze the performance of the nonparametric least squares estimator (LSE)
We leverage recently developed information-theoretic methods to establish the optimality of the LSE for non hypotheses classes.
We specialize our results to a number of scenarios of practical interest, such as Lipschitz dynamics, generalized linear models, and dynamics described by functions in certain classes of Reproducing Kernel Hilbert Spaces (RKHS)
arXiv Detail & Related papers (2022-02-16T19:38:54Z) - An Online Riemannian PCA for Stochastic Canonical Correlation Analysis [37.8212762083567]
We present an efficient algorithm (RSG+) for canonical correlation analysis (CCA) using a reparametrization of the projection matrices.
While the paper primarily focuses on the formulation and technical analysis of its properties, our experiments show that the empirical behavior on common datasets is quite promising.
arXiv Detail & Related papers (2021-06-08T23:38:29Z) - Solving weakly supervised regression problem using low-rank manifold
regularization [77.34726150561087]
We solve a weakly supervised regression problem.
Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources.
In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data.
We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z) - A conditional one-output likelihood formulation for multitask Gaussian
processes [0.0]
Multitask Gaussian processes (MTGP) are the Gaussian process framework's solution for multioutput regression problems.
Here we introduce a novel approach that simplifies the multitask learning.
We show that it is computationally competitive with state of the art options.
arXiv Detail & Related papers (2020-06-05T14:59:06Z) - Efficient algorithms for multivariate shape-constrained convex
regression problems [9.281671380673306]
We prove that the least squares estimator is computable via solving a constrained convex programming (QP) problem with $(n+1)d$ variables and at least $n(n-1)$ linear inequality constraints.
For solving the generally very large-scale convex QP, we design two efficient algorithms, one is the symmetric Gauss-Seidel based alternating direction method of multipliers (tt sGS-ADMM), and the other is the proximal augmented Lagrangian method (tt pALM) with the subproblems solved by the semismooth Newton method (t
arXiv Detail & Related papers (2020-02-26T11:18:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.