Related papers: Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

URL: http://arxiv.org/abs/2001.06471v2
Date: Sun, 6 Jun 2021 17:38:09 GMT
Title: Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives
Authors: Antoine Dedieu, Hussein Hazimeh, Rahul Mazumder
Abstract summary: Mixed integer programming (MIP) can be used to solve (to optimality) $ell_0$-regularized regression problems. We propose two classes of scalable algorithms: an exact algorithm that can handlepapprox 50,000$ features in a few minutes, and approximate algorithms that can address instances with $papprox6$. In addition, we present new estimation error bounds for $ell$-regularizeds.
Score: 10.291482850329892
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a discrete optimization formulation for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) $\ell_0$-regularized regression problems at scales much larger than what was conventionally considered possible. Despite their usefulness, MIP-based global optimization approaches are significantly slower compared to the relatively mature algorithms for $\ell_1$-regularization and heuristics for nonconvex regularized problems. We aim to bridge this gap in computation times by developing new MIP-based algorithms for $\ell_0$-regularized classification. We propose two classes of scalable algorithms: an exact algorithm that can handle $p\approx 50,000$ features in a few minutes, and approximate algorithms that can address instances with $p\approx 10^6$ in times comparable to the fast $\ell_1$-based algorithms. Our exact algorithm is based on the novel idea of \textsl{integrality generation}, which solves the original problem (with $p$ binary variables) via a sequence of mixed integer programs that involve a small number of binary variables. Our approximate algorithms are based on coordinate descent and local combinatorial search. In addition, we present new estimation error bounds for a class of $\ell_0$-regularized estimators. Experiments on real and synthetic data demonstrate that our approach leads to models with considerably improved statistical performance (especially, variable selection) when compared to competing methods.

Related papers

Approximation Algorithms for Preference Aggregation Using CP-Nets [3.337244446073836]
This paper studies the design and analysis of approximation algorithms for aggregating preferences over Conditional Preference Networks (CP-nets) Its focus is on aggregating preferences over so-called emphswaps, for which optimal solutions in general are already known to be of exponential size.
arXiv Detail & Related papers (2023-12-14T17:31:38Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting. Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function. We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z)
Learning distributed representations with efficient SoftMax normalization [3.8673630752805437]
We propose a linear-time approximation to compute the normalization constants of $rm SoftMax(XYT)$ for embedding vectors with bounded norms.<n>We show on some pre-trained embedding datasets that the proposed estimation method achieves higher or comparable accuracy with competing methods.<n>The proposed algorithm is interpretable and easily adapted to arbitrary embedding problems.
arXiv Detail & Related papers (2023-03-30T15:48:26Z)
Private estimation algorithms for stochastic block models and mixture models [63.07482515700984]
General tools for designing efficient private estimation algorithms. First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z)
Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z)
Grouped Variable Selection with Discrete Optimization: Computational and Statistical Perspectives [9.593208961737572]
We present a new algorithmic framework for grouped variable selection that is based on discrete mathematical optimization. Our methodology covers both high-dimensional linear regression and non- additive modeling with sparse programming. Our exact algorithm is based on a standalone branch-and-bound (BnB) framework, which can solve the associated mixed integer (MIP) problem to certified optimality.
arXiv Detail & Related papers (2021-04-14T19:21:59Z)
Online Model Selection for Reinforcement Learning with Function Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret. We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z)
Practical and Parallelizable Algorithms for Non-Monotone Submodular Maximization with Size Constraint [20.104148319012854]
We present and parallelizable for a submodular function, not necessarily a monotone, with respect to a size constraint. We improve the best approximation factor achieved by an algorithm that has optimal adaptivity and nearly optimal complexity query to $0.193 - varepsilon$.
arXiv Detail & Related papers (2020-09-03T22:43:55Z)
A new heuristic algorithm for fast k-segmentation [0.0]
Exact and approximate methods for $k$-segmentation exist in the literature. A novel algorithm is proposed in this paper to improve upon existing methods. It is empirically found to provide accuracies competitive with exact methods at a fraction of the computational expense.
arXiv Detail & Related papers (2020-09-02T04:50:17Z)
Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions. A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples. We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.