Reducing the Need for Backpropagation and Discovering Better Optima With
Explicit Optimizations of Neural Networks
- URL: http://arxiv.org/abs/2311.07498v1
- Date: Mon, 13 Nov 2023 17:38:07 GMT
- Title: Reducing the Need for Backpropagation and Discovering Better Optima With
Explicit Optimizations of Neural Networks
- Authors: Jake Ryland Williams and Haoran Zhao
- Abstract summary: We propose a computationally efficient alternative for optimizing neural networks.
We derive an explicit solution to a simple feed-forward language model.
We show that explicit solutions perform near-optimality in experiments.
- Score: 4.807347156077897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Iterative differential approximation methods that rely upon backpropagation
have enabled the optimization of neural networks; however, at present, they
remain computationally expensive, especially when training models at scale. In
this paper, we propose a computationally efficient alternative for optimizing
neural networks that can both reduce the costs of scaling neural networks and
provide high-efficiency optimizations for low-resource applications. We derive
an explicit solution to a simple feed-forward language model (LM) by
mathematically analyzing its gradients. This solution generalizes from
single-layer LMs to the class of all single-layer feed-forward
softmax-activated neural models trained on positive-valued features, as is
demonstrated by our extension of this solution application to MNIST digit
classification. For both LM and digit classifiers, we find computationally that
explicit solutions perform near-optimality in experiments showing that 1)
iterative optimization only marginally improves the explicit solution
parameters and 2) randomly initialized parameters iteratively optimize towards
the explicit solution. We also preliminarily apply the explicit solution
locally by layer in multi-layer networks and discuss how the solution's
computational savings increase with model complexity -- for both single- and
mult-layer applications of the explicit solution, we emphasize that the optima
achieved cannot be reached by backpropagation alone, i.e., better optima appear
discoverable only after explicit solutions are applied. Finally, we discuss the
solution's computational savings alongside its impact on model interpretability
and suggest future directions for the derivation of explicit solutions to
complex- and multi-layer architectures.
Related papers
- Self-Supervised Learning of Iterative Solvers for Constrained Optimization [0.0]
We propose a learning-based iterative solver for constrained optimization.
It can obtain very fast and accurate solutions by customizing the solver to a specific parametric optimization problem.
A novel loss function based on the Karush-Kuhn-Tucker conditions of optimality is introduced, enabling fully self-supervised training of both neural networks.
arXiv Detail & Related papers (2024-09-12T14:17:23Z) - Analyzing and Enhancing the Backward-Pass Convergence of Unrolled
Optimization [50.38518771642365]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
A central challenge in this setting is backpropagation through the solution of an optimization problem, which often lacks a closed form.
This paper provides theoretical insights into the backward pass of unrolled optimization, showing that it is equivalent to the solution of a linear system by a particular iterative method.
A system called Folded Optimization is proposed to construct more efficient backpropagation rules from unrolled solver implementations.
arXiv Detail & Related papers (2023-12-28T23:15:18Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Explicit Foundation Model Optimization with Self-Attentive Feed-Forward
Neural Units [4.807347156077897]
Iterative approximation methods using backpropagation enable the optimization of neural networks, but they remain computationally expensive when used at scale.
This paper presents an efficient alternative for optimizing neural networks that reduces the costs of scaling neural networks and provides high-efficiency optimizations for low-resource applications.
arXiv Detail & Related papers (2023-11-13T17:55:07Z) - Optimal Sets and Solution Paths of ReLU Networks [56.40911684005949]
We develop an analytical framework to characterize the set of optimal ReLU networks.
We establish conditions for the neuralization of ReLU networks to be continuous, and develop sensitivity results for ReLU networks.
arXiv Detail & Related papers (2023-05-31T18:48:16Z) - Backpropagation of Unrolled Solvers with Folded Optimization [55.04219793298687]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver.
This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation.
arXiv Detail & Related papers (2023-01-28T01:50:42Z) - Acceleration techniques for optimization over trained neural network
ensembles [1.0323063834827415]
We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit activation.
We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network.
arXiv Detail & Related papers (2021-12-13T20:50:54Z) - Neural Spectrahedra and Semidefinite Lifts: Global Convex Optimization
of Polynomial Activation Neural Networks in Fully Polynomial-Time [31.94590517036704]
We develop exact convex optimization formulations for two-layer numerical networks with second degree activations.
We show that semidefinite neural and therefore global optimization is in complexity dimension and sample size for all input data.
The proposed approach is significantly faster to obtain better test accuracy compared to the standard backpropagation procedure.
arXiv Detail & Related papers (2021-01-07T08:43:01Z) - Efficient and Sparse Neural Networks by Pruning Weights in a
Multiobjective Learning Approach [0.0]
We propose a multiobjective perspective on the training of neural networks by treating its prediction accuracy and the network complexity as two individual objective functions.
Preliminary numerical results on exemplary convolutional neural networks confirm that large reductions in the complexity of neural networks with neglibile loss of accuracy are possible.
arXiv Detail & Related papers (2020-08-31T13:28:03Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z) - Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations.
Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization.
It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.