Optimizing the optimizer for data driven deep neural networks and
physics informed neural networks
- URL: http://arxiv.org/abs/2205.07430v1
- Date: Mon, 16 May 2022 02:42:22 GMT
- Title: Optimizing the optimizer for data driven deep neural networks and
physics informed neural networks
- Authors: John Taylor, Wenyi Wang, Biswajit Bala, Tomasz Bednarz
- Abstract summary: We investigate the role of methods in determining the quality of the model fit for neural networks with a small to medium number of parameters.
We find that LM algorithm is able to rapidly converge to machine precision offering significant benefits over other algorithms.
- Score: 2.54325834280441
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the role of the optimizer in determining the quality of the
model fit for neural networks with a small to medium number of parameters. We
study the performance of Adam, an algorithm for first-order gradient-based
optimization that uses adaptive momentum, the Levenberg and Marquardt (LM)
algorithm a second order method, Broyden,Fletcher,Goldfarb and Shanno algorithm
(BFGS) a second order method and LBFGS, a low memory version of BFGS. Using
these optimizers we fit the function y = sinc(10x) using a neural network with
a few parameters. This function has a variable amplitude and a constant
frequency. We observe that the higher amplitude components of the function are
fitted first and the Adam, BFGS and LBFGS struggle to fit the lower amplitude
components of the function. We also solve the Burgers equation using a physics
informed neural network(PINN) with the BFGS and LM optimizers. For our example
problems with a small to medium number of weights, we find that the LM
algorithm is able to rapidly converge to machine precision offering significant
benefits over other optimizers. We further investigated the Adam optimizer with
a range of models and found that Adam optimiser requires much deeper models
with large numbers of hidden units containing up to 26x more parameters, in
order to achieve a model fit close that achieved by the LM optimizer. The LM
optimizer results illustrate that it may be possible build models with far
fewer parameters. We have implemented all our methods in Keras and TensorFlow
2.
Related papers
- Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization [0.0]
We present Double Sparse Factorization (DSF), where we factorize each weight matrix into two sparse matrices.
Our method achieves state-of-the-art results, enabling unprecedented sparsification of neural networks.
arXiv Detail & Related papers (2024-09-27T15:48:39Z) - Scaling Sparse Fine-Tuning to Large Language Models [67.59697720719672]
Large Language Models (LLMs) are difficult to fully fine-tune due to their sheer number of parameters.
We propose SpIEL, a novel sparse finetuning method which maintains an array of parameter indices and the deltas of these parameters relative to their pretrained values.
We show that SpIEL is superior to popular parameter-efficient fine-tuning methods like LoRA in terms of performance and comparable in terms of run time.
arXiv Detail & Related papers (2024-01-29T18:43:49Z) - Explicit Foundation Model Optimization with Self-Attentive Feed-Forward
Neural Units [4.807347156077897]
Iterative approximation methods using backpropagation enable the optimization of neural networks, but they remain computationally expensive when used at scale.
This paper presents an efficient alternative for optimizing neural networks that reduces the costs of scaling neural networks and provides high-efficiency optimizations for low-resource applications.
arXiv Detail & Related papers (2023-11-13T17:55:07Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers [66.823588073584]
Large language models (LLMs) have shown remarkable instruction-following capabilities and achieved impressive performances in various applications.
Recent work has used the query-efficient Bayesian optimization (BO) algorithm to automatically optimize the instructions given to black-box LLMs.
We propose a neural bandit algorithm which replaces the GP in BO by an NN surrogate to optimize instructions for black-box LLMs.
arXiv Detail & Related papers (2023-10-02T02:01:16Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Learning to Optimize Quasi-Newton Methods [22.504971951262004]
This paper introduces a novel machine learning called LODO, which tries to online meta-learn the best preconditioner during optimization.
Unlike other L2O methods, LODO does not require any meta-training on a training task distribution.
We show that our gradient approximates the inverse Hessian in noisy loss landscapes and is capable of representing a wide range of inverse Hessians.
arXiv Detail & Related papers (2022-10-11T03:47:14Z) - Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs [0.0]
Training deep neural networks consumes increasing computational resource shares in many compute centers.
We introduce a novel second-order optimization method that requires the effect of the Hessian on a vector only.
We compare the proposed second-order method with two state-of-the-arts on five representative neural network problems.
arXiv Detail & Related papers (2022-08-03T12:38:23Z) - Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points.
The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding.
In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z) - Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations.
Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization.
It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.