Maximin Optimization for Binary Regression
- URL: http://arxiv.org/abs/2010.05077v3
- Date: Sat, 28 Nov 2020 00:35:28 GMT
- Title: Maximin Optimization for Binary Regression
- Authors: Nisan Chiprut, Amir Globerson, Ami Wiesel
- Abstract summary: regression problems with binary weights are ubiquitous in quantized learning models and digital communication systems.
Lagrangran method also performs well in regression with cross entropy loss, as well as non- neural multi-layer saddle-point optimization.
- Score: 24.351803097593887
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider regression problems with binary weights. Such optimization
problems are ubiquitous in quantized learning models and digital communication
systems. A natural approach is to optimize the corresponding Lagrangian using
variants of the gradient ascent-descent method. Such maximin techniques are
still poorly understood even in the concave-convex case. The non-convex binary
constraints may lead to spurious local minima. Interestingly, we prove that
this approach is optimal in linear regression with low noise conditions as well
as robust regression with a small number of outliers. Practically, the method
also performs well in regression with cross entropy loss, as well as non-convex
multi-layer neural networks. Taken together our approach highlights the
potential of saddle-point optimization for learning constrained models.
Related papers
- Efficient Optimization Algorithms for Linear Adversarial Training [9.933836677441684]
Adversarial training can be used to learn models that are robust against perturbations.
We propose tailored optimization algorithms for the adversarial training of linear models.
arXiv Detail & Related papers (2024-10-16T15:41:08Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Learning Constrained Optimization with Deep Augmented Lagrangian Methods [54.22290715244502]
A machine learning (ML) model is trained to emulate a constrained optimization solver.
This paper proposes an alternative approach, in which the ML model is trained to predict dual solution estimates directly.
It enables an end-to-end training scheme is which the dual objective is as a loss function, and solution estimates toward primal feasibility, emulating a Dual Ascent method.
arXiv Detail & Related papers (2024-03-06T04:43:22Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Computationally Efficient and Statistically Optimal Robust
High-Dimensional Linear Regression [15.389011827844572]
High-tailed linear regression under heavy-tailed noise or objective corruption is challenging, both computationally statistically.
In this paper, we introduce an algorithm for both the noise Gaussian or heavy 1 + epsilon regression problems.
arXiv Detail & Related papers (2023-05-10T14:31:03Z) - Slowly Varying Regression under Sparsity [5.22980614912553]
We present the framework of slowly hyper regression under sparsity, allowing regression models to exhibit slow and sparse variations.
We suggest a procedure that reformulates as a binary convex algorithm.
We show that the resulting model outperforms competing formulations in comparable times across various datasets.
arXiv Detail & Related papers (2021-02-22T04:51:44Z) - Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve
Optimism, Embrace Virtual Curvature [61.22680308681648]
We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward.
For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
arXiv Detail & Related papers (2021-02-08T12:41:56Z) - Adaptive Sampling of Pareto Frontiers with Binary Constraints Using
Regression and Classification [0.0]
We present a novel adaptive optimization algorithm for black-box multi-objective optimization problems with binary constraints.
Our method is based on probabilistic regression and classification models, which act as a surrogate for the optimization goals.
We also present a novel ellipsoid truncation method to speed up the expected hypervolume calculation.
arXiv Detail & Related papers (2020-08-27T09:15:02Z) - Regression via Implicit Models and Optimal Transport Cost Minimization [5.144809478361603]
Conditional GAN (CGAN) has been applied for regression.
Current CGAN implementation for regression uses the classical generator-discriminator architecture.
We propose a solution which directly optimize the optimal transport cost between the true probability distribution $p(y|x)$ and the estimated distribution $hatp(y|x)$.
arXiv Detail & Related papers (2020-03-03T02:26:54Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.