Sparse Regression at Scale: Branch-and-Bound rooted in First-Order
Optimization
- URL: http://arxiv.org/abs/2004.06152v2
- Date: Wed, 14 Apr 2021 20:18:21 GMT
- Title: Sparse Regression at Scale: Branch-and-Bound rooted in First-Order
Optimization
- Authors: Hussein Hazimeh, Rahul Mazumder, Ali Saab
- Abstract summary: We present a new exact MIP framework for $ell_0$ regularized regression.
Our framework can scale to $p sim 107$, achieving speedups of at least $5000$x.
We open source the implementation through our toolkit L0BnB.
- Score: 6.037383467521294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the least squares regression problem, penalized with a
combination of the $\ell_{0}$ and squared $\ell_{2}$ penalty functions (a.k.a.
$\ell_0 \ell_2$ regularization). Recent work shows that the resulting
estimators are of key importance in many high-dimensional statistical settings.
However, exact computation of these estimators remains a major challenge.
Indeed, modern exact methods, based on mixed integer programming (MIP), face
difficulties when the number of features $p \sim 10^4$. In this work, we
present a new exact MIP framework for $\ell_0\ell_2$-regularized regression
that can scale to $p \sim 10^7$, achieving speedups of at least $5000$x,
compared to state-of-the-art exact methods. Unlike recent work, which relies on
modern commercial MIP solvers, we design a specialized nonlinear
branch-and-bound (BnB) framework, by critically exploiting the problem
structure. A key distinguishing component in our framework lies in efficiently
solving the node relaxations using a specialized first-order method, based on
coordinate descent (CD). Our CD-based method effectively leverages information
across the BnB nodes, through using warm starts, active sets, and gradient
screening. In addition, we design a novel method for obtaining dual bounds from
primal CD solutions, which certifiably works in high dimensions. Experiments on
synthetic and real high-dimensional datasets demonstrate that our framework is
not only significantly faster than the state of the art, but can also deliver
certifiably optimal solutions to statistically challenging instances that
cannot be handled with existing methods. We open source the implementation
through our toolkit L0BnB.
Related papers
- Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs [63.47351876442425]
We study episodic linear mixture MDPs with the unknown transition and adversarial rewards under full-information feedback.
We propose a novel algorithm that combines the benefits of two popular methods: occupancy-measure-based and policy-based.
Our algorithm enjoys an $widetildemathcalO(d sqrtH3 K + sqrtHK(H + barP_K$)$ dynamic regret, where $d$ is the feature dimension.
arXiv Detail & Related papers (2024-11-05T13:55:52Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - A Specialized Semismooth Newton Method for Kernel-Based Optimal
Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples.
We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z) - Sparse Gaussian Graphical Models with Discrete Optimization:
Computational and Statistical Perspectives [8.403841349300103]
We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model.
We propose GraphL0BnB, a new estimator based on an $ell_0$-penalized version of the pseudolikelihood function.
Our numerical experiments on real/synthetic datasets suggest that our method can solve, to near-optimality, problem instances with $p = 104$.
arXiv Detail & Related papers (2023-07-18T15:49:02Z) - Compressed Sensing: A Discrete Optimization Approach [5.877778007271621]
We present a semidefinite relaxation that strengthens the second order cone relaxation and develop a custom branch-and-bound algorithm.
When used as a component of a multi-label classification algorithm, our approach achieves greater classification accuracy than benchmark compressed sensing methods.
arXiv Detail & Related papers (2023-06-05T01:29:24Z) - Scalable Optimal Multiway-Split Decision Trees with Constraints [3.092691764363848]
We present a novel path-based MIP formulation where the number of decision variables is independent of $N$.
Our framework produces a multiway-split tree which is more interpretable than the typical binary-split trees due to its shorter rules.
We present results on datasets containing up to 1,008,372 samples while existing MIP-based decision tree models do not scale well on data beyond a few thousand points.
arXiv Detail & Related papers (2023-02-14T03:48:48Z) - Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Grouped Variable Selection with Discrete Optimization: Computational and
Statistical Perspectives [9.593208961737572]
We present a new algorithmic framework for grouped variable selection that is based on discrete mathematical optimization.
Our methodology covers both high-dimensional linear regression and non- additive modeling with sparse programming.
Our exact algorithm is based on a standalone branch-and-bound (BnB) framework, which can solve the associated mixed integer (MIP) problem to certified optimality.
arXiv Detail & Related papers (2021-04-14T19:21:59Z) - Learning Sparse Classifiers: Continuous and Mixed Integer Optimization
Perspectives [10.291482850329892]
Mixed integer programming (MIP) can be used to solve (to optimality) $ell_0$-regularized regression problems.
We propose two classes of scalable algorithms: an exact algorithm that can handlepapprox 50,000$ features in a few minutes, and approximate algorithms that can address instances with $papprox6$.
In addition, we present new estimation error bounds for $ell$-regularizeds.
arXiv Detail & Related papers (2020-01-17T18:47:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.