On Correctness of Automatic Differentiation for Non-Differentiable
Functions
- URL: http://arxiv.org/abs/2006.06903v2
- Date: Mon, 26 Oct 2020 04:44:54 GMT
- Title: On Correctness of Automatic Differentiation for Non-Differentiable
Functions
- Authors: Wonyeol Lee, Hangyeol Yu, Xavier Rival, Hongseok Yang
- Abstract summary: We show that autodiff systems are correct in any formal sense when applied to non-differentiable functions.
We propose a new type of derivatives, called intensional derivatives, and prove that these derivatives always exist and coincide with standard derivatives for almost all inputs.
In this way, we formally establish the correctness of autodiff systems applied to non-differentiable functions.
- Score: 14.222887950206658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentiation lies at the core of many machine-learning algorithms, and is
well-supported by popular autodiff systems, such as TensorFlow and PyTorch.
Originally, these systems have been developed to compute derivatives of
differentiable functions, but in practice, they are commonly applied to
functions with non-differentiabilities. For instance, neural networks using
ReLU define non-differentiable functions in general, but the gradients of
losses involving those functions are computed using autodiff systems in
practice. This status quo raises a natural question: are autodiff systems
correct in any formal sense when they are applied to such non-differentiable
functions? In this paper, we provide a positive answer to this question. Using
counterexamples, we first point out flaws in often-used informal arguments,
such as: non-differentiabilities arising in deep learning do not cause any
issues because they form a measure-zero set. We then investigate a class of
functions, called PAP functions, that includes nearly all (possibly
non-differentiable) functions in deep learning nowadays. For these PAP
functions, we propose a new type of derivatives, called intensional
derivatives, and prove that these derivatives always exist and coincide with
standard derivatives for almost all inputs. We also show that these intensional
derivatives are what most autodiff systems compute or try to compute
essentially. In this way, we formally establish the correctness of autodiff
systems applied to non-differentiable functions.
Related papers
- Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - Automatic differentiation of nonsmooth iterative algorithms [0.0]
We study nonsmooth piggyback automatic differentiation (AD) under appropriate nonexpansivity conditions.
We show that the attractor set of nonsmooth piggyback iterations is a set-valued fixed point which remains in the conservative framework.
Our results are illustrated on parametric convex optimization problems with forward-backward, Douglas-Rachford and Alternating Direction of Multiplier algorithms as well as the Heavy-Ball method.
arXiv Detail & Related papers (2022-05-31T07:58:37Z) - Decoupling multivariate functions using a nonparametric filtered tensor
decomposition [0.29360071145551075]
Decoupling techniques aim at providing an alternative representation of the nonlinearity.
The so-called decoupled form is often a more efficient parameterisation of the relationship while being highly structured, favouring interpretability.
In this work two new algorithms, based on filtered tensor decompositions of first order derivative information are introduced.
arXiv Detail & Related papers (2022-05-23T09:34:17Z) - Measuring dissimilarity with diffeomorphism invariance [94.02751799024684]
We introduce DID, a pairwise dissimilarity measure applicable to a wide range of data spaces.
We prove that DID enjoys properties which make it relevant for theoretical study and practical use.
arXiv Detail & Related papers (2022-02-11T13:51:30Z) - Adjoint-aided inference of Gaussian process driven differential
equations [0.8257490175399691]
We show how the adjoint of a linear system can be used to efficiently infer forcing functions modelled as GPs.
We demonstrate the approach on systems of both ordinary and partial differential equations.
arXiv Detail & Related papers (2022-02-09T17:35:14Z) - Neural Network Approximation of Refinable Functions [8.323468006516018]
We show that refinable functions are approximated by the outputs of deep ReLU networks with a fixed width and increasing depth with accuracy exponential.
Our results apply to functions used in the standard construction of wavelets as well as to functions constructed via subdivision algorithms in Computer Aided Geometric Design.
arXiv Detail & Related papers (2021-07-28T06:45:36Z) - Efficient and Modular Implicit Differentiation [68.74748174316989]
We propose a unified, efficient and modular approach for implicit differentiation of optimization problems.
We show that seemingly simple principles allow to recover many recently proposed implicit differentiation methods and create new ones easily.
arXiv Detail & Related papers (2021-05-31T17:45:58Z) - A Functional Perspective on Learning Symmetric Functions with Neural
Networks [48.80300074254758]
We study the learning and representation of neural networks defined on measures.
We establish approximation and generalization bounds under different choices of regularization.
The resulting models can be learned efficiently and enjoy generalization guarantees that extend across input sizes.
arXiv Detail & Related papers (2020-08-16T16:34:33Z) - Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data.
We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z) - Automatic Differentiation in ROOT [62.997667081978825]
In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to evaluate the derivative of a function specified by a computer program.
This paper presents AD techniques available in ROOT, supported by Cling, to produce derivatives of arbitrary C/C++ functions.
arXiv Detail & Related papers (2020-04-09T09:18:50Z) - Densities of Almost Surely Terminating Probabilistic Programs are
Differentiable Almost Everywhere [1.911678487931003]
We study the differential properties of higher-order statistical probabilistic programs with recursion and conditioning.
A by-product of this work is that almost-surely terminating deterministic (S)PCF programs with real parameters denote functions that are almost-everywhere differentiability.
arXiv Detail & Related papers (2020-04-08T10:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.