A Quasilinear Algorithm for Computing Higher-Order Derivatives of Deep Feed-Forward Neural Networks
- URL: http://arxiv.org/abs/2412.09752v1
- Date: Thu, 12 Dec 2024 22:57:28 GMT
- Title: A Quasilinear Algorithm for Computing Higher-Order Derivatives of Deep Feed-Forward Neural Networks
- Authors: Kyle R. Chickering,
- Abstract summary: $n$-TangentProp computes the exact derivative $dn/dxn f(x)$ in quasilinear, instead of exponential time.
We demonstrate that our method is particularly beneficial in the context of physics-informed neural networks.
- Score: 0.0
- License:
- Abstract: The use of neural networks for solving differential equations is practically difficult due to the exponentially increasing runtime of autodifferentiation when computing high-order derivatives. We propose $n$-TangentProp, the natural extension of the TangentProp formalism \cite{simard1991tangent} to arbitrarily many derivatives. $n$-TangentProp computes the exact derivative $d^n/dx^n f(x)$ in quasilinear, instead of exponential time, for a densely connected, feed-forward neural network $f$ with a smooth, parameter-free activation function. We validate our algorithm empirically across a range of depths, widths, and number of derivatives. We demonstrate that our method is particularly beneficial in the context of physics-informed neural networks where \ntp allows for significantly faster training times than previous methods and has favorable scaling with respect to both model size and loss-function complexity as measured by the number of required derivatives. The code for this paper can be found at https://github.com/kyrochi/n\_tangentprop.
Related papers
- Stochastic Taylor Derivative Estimator: Efficient amortization for arbitrary differential operators [29.063441432499776]
We show how to efficiently perform arbitrary contraction of the derivative tensor of arbitrary order for multivariate functions.
When applied to Physics-Informed Neural Networks (PINNs), our method provides >1000$times$ speed-up and.
30$times$ memory reduction over randomization with first-order AD.
arXiv Detail & Related papers (2024-11-27T09:37:33Z) - Neural Networks and (Virtual) Extended Formulations [5.762677915745415]
We prove lower bounds on the size of neural networks that optimize over $P$.
We show that $mathrmxc(P)$ is a lower bound on the size of any monotone or input neural network that solves the linear optimization problem over $P$.
arXiv Detail & Related papers (2024-11-05T11:12:11Z) - Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives [0.0]
We focus on the exact and explicit computation of projections of the Hessian and higher-order derivatives on well-chosen subspaces.
We propose an optimization method exploiting tensors at order 2 and 3 with several interesting properties.
arXiv Detail & Related papers (2023-12-06T20:24:05Z) - HOPE: High-order Polynomial Expansion of Black-box Neural Networks [7.156504968033132]
We introduce HOPE (High-order Polynomial Expansion), a method for expanding a network into a high-order Taylor on a reference input.
Numerical analysis confirms the high accuracy, low computational complexity, and good convergence of the proposed method.
We demonstrate HOPE's wide applications built on deep learning, including function discovery, fast inference, and feature selection.
arXiv Detail & Related papers (2023-07-17T01:46:15Z) - Scaling Gaussian Processes with Derivative Information Using Variational
Inference [17.746842802181256]
We introduce methods to achieve fully scalable Gaussian process regression with derivatives using variational inference.
We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed.
arXiv Detail & Related papers (2021-07-08T18:23:59Z) - Higher-order Derivatives of Weighted Finite-state Machines [68.43084108204741]
This work examines the computation of higher-order derivatives with respect to the normalization constant for weighted finite-state machines.
We provide a general algorithm for evaluating derivatives of all orders, which has not been previously described in the literature.
Our algorithm is significantly faster than prior algorithms.
arXiv Detail & Related papers (2021-06-01T19:51:55Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Deep Polynomial Neural Networks [77.70761658507507]
$Pi$Nets are a new class of function approximators based on expansions.
$Pi$Nets produce state-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
arXiv Detail & Related papers (2020-06-20T16:23:32Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.