Related papers: Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection

Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection

URL: http://arxiv.org/abs/2007.13171v2
Date: Mon, 19 Apr 2021 22:09:21 GMT
Title: Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection
Authors: Elizabeth Newman, Lars Ruthotto, Joseph Hart, Bart van Bloemen Waanders
Abstract summary: Deep neural networks (DNNs) have achieved state-of-theart performance across a variety of traditional machine learning tasks. In this paper, we consider training of DNNs, which arises in many state-of-the-art applications.
Score: 2.7561479348365734
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations (PDE) and to generate surrogate models. In this paper, we consider the supervised training of DNNs, which arises in many of the above applications. We focus on the central problem of optimizing the weights of the given DNN such that it accurately approximates the relation between observed input and target data. Devising effective solvers for this optimization problem is notoriously challenging due to the large number of weights, non-convexity, data-sparsity, and non-trivial choice of hyperparameters. To solve the optimization problem more efficiently, we propose the use of variable projection (VarPro), a method originally designed for separable nonlinear least-squares problems. Our main contribution is the Gauss-Newton VarPro method (GNvpro) that extends the reach of the VarPro idea to non-quadratic objective functions, most notably, cross-entropy loss functions arising in classification. These extensions make GNvpro applicable to all training problems that involve a DNN whose last layer is an affine mapping, which is common in many state-of-the-art architectures. In our four numerical experiments from surrogate modeling, segmentation, and classification GNvpro solves the optimization problem more efficiently than commonly-used stochastic gradient descent (SGD) schemes. Also, GNvpro finds solutions that generalize well, and in all but one example better than well-tuned SGD methods, to unseen data points.

Related papers

GDSG: Graph Diffusion-based Solution Generator for Optimization Problems in MEC Networks [109.17835015018532]
We present a Graph Diffusion-based Solution Generation (GDSG) method. This approach is designed to work with suboptimal datasets while converging to the optimal solution large probably. We build GDSG as a multi-task diffusion model utilizing a Graph Neural Network (GNN) to acquire the distribution of high-quality solutions.
arXiv Detail & Related papers (2024-12-11T11:13:43Z)
DiffGrad for Physics-Informed Neural Networks [0.0]
Burgers' equation, a fundamental equation in fluid dynamics that is extensively used in PINNs, provides flexible results with the Adamprop. This paper introduces a novel strategy for solving Burgers' equation by incorporating DiffGrad with PINNs.
arXiv Detail & Related papers (2024-09-05T04:39:35Z)
Enhancing GNNs Performance on Combinatorial Optimization by Recurrent Feature Update [0.09986418756990156]
We introduce a novel algorithm, denoted hereafter as QRF-GNN, leveraging the power of GNNs to efficiently solve Combinatorial optimization (CO) problems. It relies on unsupervised learning by minimizing the loss function derived from QUBO relaxation. Results of experiments show that QRF-GNN drastically surpasses existing learning-based approaches and is comparable to the state-of-the-art conventionals.
arXiv Detail & Related papers (2024-07-23T13:34:35Z)
Training Artificial Neural Networks by Coordinate Search Algorithm [0.20971479389679332]
We propose an efficient version of the gradient-free Coordinate Search (CS) algorithm for training neural networks. The proposed algorithm can be used with non-differentiable activation functions and tailored to multi-objective/multi-loss problems. Finding the optimal values for weights of ANNs is a large-scale optimization problem.
arXiv Detail & Related papers (2024-02-20T01:47:25Z)
Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS) We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
Data-informed Deep Optimization [3.331457049134526]
We propose a data-informed deep optimization (DiDo) approach to solve high-dimensional design problems. We use a deep neural network (DNN) to learn the feasible region and to sample feasible points for fitting the objective function. Our results indicate that the DiDo approach empowered by DNN is flexible and promising for solving general high-dimensional design problems in practice.
arXiv Detail & Related papers (2021-07-17T02:53:54Z)
dNNsolve: an efficient NN-based PDE solver [62.997667081978825]
We introduce dNNsolve, that makes use of dual Neural Networks to solve ODEs/PDEs. We show that dNNsolve is capable of solving a broad range of ODEs/PDEs in 1, 2 and 3 spacetime dimensions.
arXiv Detail & Related papers (2021-03-15T19:14:41Z)
Deep Neural Networks Are Effective At Learning High-Dimensional Hilbert-Valued Functions From Limited Data [6.098254376499899]
We focus on approximating functions that are Hilbert-valued, i.e. take values in a separable, but typically infinite-dimensional, Hilbert space. We present a novel result on DNN training for holomorphic functions with so-called hidden anisotropy. We show that there exists a procedure for learning Hilbert-valued functions via DNNs that performs as well as, but no better than current best-in-class schemes.
arXiv Detail & Related papers (2020-12-11T02:02:14Z)
Towards an Efficient and General Framework of Robust Training for Graph Neural Networks [96.93500886136532]
Graph Neural Networks (GNNs) have made significant advances on several fundamental inference tasks. Despite GNNs' impressive performance, it has been observed that carefully crafted perturbations on graph structures lead them to make wrong predictions. We propose a general framework which leverages the greedy search algorithms and zeroth-order methods to obtain robust GNNs.
arXiv Detail & Related papers (2020-02-25T15:17:58Z)
Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations. Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization. It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.