Train Like a (Var)Pro: Efficient Training of Neural Networks with
Variable Projection
- URL: http://arxiv.org/abs/2007.13171v2
- Date: Mon, 19 Apr 2021 22:09:21 GMT
- Title: Train Like a (Var)Pro: Efficient Training of Neural Networks with
Variable Projection
- Authors: Elizabeth Newman, Lars Ruthotto, Joseph Hart, Bart van Bloemen
Waanders
- Abstract summary: Deep neural networks (DNNs) have achieved state-of-theart performance across a variety of traditional machine learning tasks.
In this paper, we consider training of DNNs, which arises in many state-of-the-art applications.
- Score: 2.7561479348365734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) have achieved state-of-the-art performance across
a variety of traditional machine learning tasks, e.g., speech recognition,
image classification, and segmentation. The ability of DNNs to efficiently
approximate high-dimensional functions has also motivated their use in
scientific applications, e.g., to solve partial differential equations (PDE)
and to generate surrogate models. In this paper, we consider the supervised
training of DNNs, which arises in many of the above applications. We focus on
the central problem of optimizing the weights of the given DNN such that it
accurately approximates the relation between observed input and target data.
Devising effective solvers for this optimization problem is notoriously
challenging due to the large number of weights, non-convexity, data-sparsity,
and non-trivial choice of hyperparameters. To solve the optimization problem
more efficiently, we propose the use of variable projection (VarPro), a method
originally designed for separable nonlinear least-squares problems. Our main
contribution is the Gauss-Newton VarPro method (GNvpro) that extends the reach
of the VarPro idea to non-quadratic objective functions, most notably,
cross-entropy loss functions arising in classification. These extensions make
GNvpro applicable to all training problems that involve a DNN whose last layer
is an affine mapping, which is common in many state-of-the-art architectures.
In our four numerical experiments from surrogate modeling, segmentation, and
classification GNvpro solves the optimization problem more efficiently than
commonly-used stochastic gradient descent (SGD) schemes. Also, GNvpro finds
solutions that generalize well, and in all but one example better than
well-tuned SGD methods, to unseen data points.
Related papers
- DiffGrad for Physics-Informed Neural Networks [0.0]
Burgers' equation, a fundamental equation in fluid dynamics that is extensively used in PINNs, provides flexible results with the Adamprop.
This paper introduces a novel strategy for solving Burgers' equation by incorporating DiffGrad with PINNs.
arXiv Detail & Related papers (2024-09-05T04:39:35Z) - Enhancing GNNs Performance on Combinatorial Optimization by Recurrent Feature Update [0.09986418756990156]
We introduce a novel algorithm, denoted hereafter as QRF-GNN, leveraging the power of GNNs to efficiently solve Combinatorial optimization (CO) problems.
It relies on unsupervised learning by minimizing the loss function derived from QUBO relaxation.
Results of experiments show that QRF-GNN drastically surpasses existing learning-based approaches and is comparable to the state-of-the-art conventionals.
arXiv Detail & Related papers (2024-07-23T13:34:35Z) - Training Artificial Neural Networks by Coordinate Search Algorithm [0.20971479389679332]
We propose an efficient version of the gradient-free Coordinate Search (CS) algorithm for training neural networks.
The proposed algorithm can be used with non-differentiable activation functions and tailored to multi-objective/multi-loss problems.
Finding the optimal values for weights of ANNs is a large-scale optimization problem.
arXiv Detail & Related papers (2024-02-20T01:47:25Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Data-informed Deep Optimization [3.331457049134526]
We propose a data-informed deep optimization (DiDo) approach to solve high-dimensional design problems.
We use a deep neural network (DNN) to learn the feasible region and to sample feasible points for fitting the objective function.
Our results indicate that the DiDo approach empowered by DNN is flexible and promising for solving general high-dimensional design problems in practice.
arXiv Detail & Related papers (2021-07-17T02:53:54Z) - dNNsolve: an efficient NN-based PDE solver [62.997667081978825]
We introduce dNNsolve, that makes use of dual Neural Networks to solve ODEs/PDEs.
We show that dNNsolve is capable of solving a broad range of ODEs/PDEs in 1, 2 and 3 spacetime dimensions.
arXiv Detail & Related papers (2021-03-15T19:14:41Z) - Deep Neural Networks Are Effective At Learning High-Dimensional
Hilbert-Valued Functions From Limited Data [6.098254376499899]
We focus on approximating functions that are Hilbert-valued, i.e. take values in a separable, but typically infinite-dimensional, Hilbert space.
We present a novel result on DNN training for holomorphic functions with so-called hidden anisotropy.
We show that there exists a procedure for learning Hilbert-valued functions via DNNs that performs as well as, but no better than current best-in-class schemes.
arXiv Detail & Related papers (2020-12-11T02:02:14Z) - Towards an Efficient and General Framework of Robust Training for Graph
Neural Networks [96.93500886136532]
Graph Neural Networks (GNNs) have made significant advances on several fundamental inference tasks.
Despite GNNs' impressive performance, it has been observed that carefully crafted perturbations on graph structures lead them to make wrong predictions.
We propose a general framework which leverages the greedy search algorithms and zeroth-order methods to obtain robust GNNs.
arXiv Detail & Related papers (2020-02-25T15:17:58Z) - Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations.
Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization.
It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.