What to Do When Your Discrete Optimization Is the Size of a Neural
Network?
- URL: http://arxiv.org/abs/2402.10339v1
- Date: Thu, 15 Feb 2024 21:57:43 GMT
- Title: What to Do When Your Discrete Optimization Is the Size of a Neural
Network?
- Authors: Hugo Silva and Martha White
- Abstract summary: Machine learning applications using neural networks involve solving discrete optimization problems.
classical approaches used in discrete settings do not scale well to large neural networks.
We take continuation path (CP) methods to represent using purely the former and Monte Carlo (MC) methods to represent the latter.
- Score: 24.546550334179486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Oftentimes, machine learning applications using neural networks involve
solving discrete optimization problems, such as in pruning,
parameter-isolation-based continual learning and training of binary networks.
Still, these discrete problems are combinatorial in nature and are also not
amenable to gradient-based optimization. Additionally, classical approaches
used in discrete settings do not scale well to large neural networks, forcing
scientists and empiricists to rely on alternative methods. Among these, two
main distinct sources of top-down information can be used to lead the model to
good solutions: (1) extrapolating gradient information from points outside of
the solution set (2) comparing evaluations between members of a subset of the
valid solutions. We take continuation path (CP) methods to represent using
purely the former and Monte Carlo (MC) methods to represent the latter, while
also noting that some hybrid methods combine the two. The main goal of this
work is to compare both approaches. For that purpose, we first overview the two
classes while also discussing some of their drawbacks analytically. Then, on
the experimental section, we compare their performance, starting with smaller
microworld experiments, which allow more fine-grained control of problem
variables, and gradually moving towards larger problems, including neural
network regression and neural network pruning for image classification, where
we additionally compare against magnitude-based pruning.
Related papers
- The Unreasonable Effectiveness of Solving Inverse Problems with Neural Networks [24.766470360665647]
We show that neural networks trained to learn solutions to inverse problems can find better solutions than classicals even on their training set.
Our findings suggest an alternative use for neural networks: rather than generalizing to new data for fast inference, they can also be used to find better solutions on known data.
arXiv Detail & Related papers (2024-08-15T12:38:10Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent
Kernels [141.29156234353133]
State-of-the-art convex learning methods can perform far worse than their centralized counterparts when clients have dissimilar data distributions.
We show this disparity can largely be attributed to challenges presented by non-NISTity.
We propose a Train-Convexify neural network (TCT) procedure to sidestep this issue.
arXiv Detail & Related papers (2022-07-13T16:58:22Z) - Acceleration techniques for optimization over trained neural network
ensembles [1.0323063834827415]
We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit activation.
We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network.
arXiv Detail & Related papers (2021-12-13T20:50:54Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Efficient and Sparse Neural Networks by Pruning Weights in a
Multiobjective Learning Approach [0.0]
We propose a multiobjective perspective on the training of neural networks by treating its prediction accuracy and the network complexity as two individual objective functions.
Preliminary numerical results on exemplary convolutional neural networks confirm that large reductions in the complexity of neural networks with neglibile loss of accuracy are possible.
arXiv Detail & Related papers (2020-08-31T13:28:03Z) - ODEN: A Framework to Solve Ordinary Differential Equations using
Artificial Neural Networks [0.0]
We prove a specific loss function, which does not require knowledge of the exact solution, to evaluate neural networks' performance.
Neural networks are shown to be proficient at approximating continuous solutions within their training domains.
A user-friendly and adaptable open-source code (ODE$mathcalN$) is provided on GitHub.
arXiv Detail & Related papers (2020-05-28T15:34:10Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.