Optimizing Neural Networks through Activation Function Discovery and
Automatic Weight Initialization
- URL: http://arxiv.org/abs/2304.03374v1
- Date: Thu, 6 Apr 2023 21:01:00 GMT
- Title: Optimizing Neural Networks through Activation Function Discovery and
Automatic Weight Initialization
- Authors: Garrett Bingham
- Abstract summary: dissertation introduces techniques for discovering more powerful activation functions.
It provides new perspectives on neural network optimization.
dissertation thus makes concrete progress towards fully automatic machine learning in the future.
- Score: 0.5076419064097734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated machine learning (AutoML) methods improve upon existing models by
optimizing various aspects of their design. While present methods focus on
hyperparameters and neural network topologies, other aspects of neural network
design can be optimized as well. To further the state of the art in AutoML,
this dissertation introduces techniques for discovering more powerful
activation functions and establishing more robust weight initialization for
neural networks. These contributions improve performance, but also provide new
perspectives on neural network optimization. First, the dissertation
demonstrates that discovering solutions specialized to specific architectures
and tasks gives better performance than reusing general approaches. Second, it
shows that jointly optimizing different components of neural networks is
synergistic, and results in better performance than optimizing individual
components alone. Third, it demonstrates that learned representations are
easier to optimize than hard-coded ones, creating further opportunities for
AutoML. The dissertation thus makes concrete progress towards fully automatic
machine learning in the future.
Related papers
- Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Physics Informed Piecewise Linear Neural Networks for Process
Optimization [0.0]
It is proposed to upgrade piece-wise linear neural network models with physics informed knowledge for optimization problems with neural network models embedded.
For all cases, physics-informed trained neural network based optimal results are closer to global optimality.
arXiv Detail & Related papers (2023-02-02T10:14:54Z) - Transformer-Based Learned Optimization [37.84626515073609]
We propose a new approach to learned optimization where we represent the computation's update step using a neural network.
Our innovation is a new neural network architecture inspired by the classic BFGS algorithm.
We demonstrate the advantages of our approach on a benchmark composed of objective functions traditionally used for the evaluation of optimization algorithms.
arXiv Detail & Related papers (2022-12-02T09:47:08Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - Position-wise optimizer: A nature-inspired optimization algorithm [0.0]
A novel nature-inspired optimization algorithm is introduced that imitates biological neural plasticity.
The model is tested on three datasets and the results are compared with gradient descent optimization.
arXiv Detail & Related papers (2022-04-11T15:30:52Z) - Gradient-Based Trajectory Optimization With Learned Dynamics [80.41791191022139]
We use machine learning techniques to learn a differentiable dynamics model of the system from data.
We show that a neural network can model highly nonlinear behaviors accurately for large time horizons.
In our hardware experiments, we demonstrate that our learned model can represent complex dynamics for both the Spot and Radio-controlled (RC) car.
arXiv Detail & Related papers (2022-04-09T22:07:34Z) - Acceleration techniques for optimization over trained neural network
ensembles [1.0323063834827415]
We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit activation.
We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network.
arXiv Detail & Related papers (2021-12-13T20:50:54Z) - Can we learn gradients by Hamiltonian Neural Networks? [68.8204255655161]
We propose a meta-learner based on ODE neural networks that learns gradients.
We demonstrate that our method outperforms a meta-learner based on LSTM for an artificial task and the MNIST dataset with ReLU activations in the optimizee.
arXiv Detail & Related papers (2021-10-31T18:35:10Z) - Convolution Neural Network Hyperparameter Optimization Using Simplified
Swarm Optimization [2.322689362836168]
Convolutional Neural Network (CNN) is widely used in computer vision.
It is not easy to find a network architecture with better performance.
arXiv Detail & Related papers (2021-03-06T00:23:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.