Efficient Learning of Generative Models via Finite-Difference Score
Matching
- URL: http://arxiv.org/abs/2007.03317v2
- Date: Wed, 25 Nov 2020 16:10:07 GMT
- Title: Efficient Learning of Generative Models via Finite-Difference Score
Matching
- Authors: Tianyu Pang, Kun Xu, Chongxuan Li, Yang Song, Stefano Ermon, Jun Zhu
- Abstract summary: We present a generic strategy to efficiently approximate any-order directional derivative with finite difference.
Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
- Score: 111.55998083406134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several machine learning applications involve the optimization of
higher-order derivatives (e.g., gradients of gradients) during training, which
can be expensive in respect to memory and computation even with automatic
differentiation. As a typical example in generative modeling, score matching
(SM) involves the optimization of the trace of a Hessian. To improve computing
efficiency, we rewrite the SM objective and its variants in terms of
directional derivatives, and present a generic strategy to efficiently
approximate any-order directional derivative with finite difference (FD). Our
approximation only involves function evaluations, which can be executed in
parallel, and no gradient computations. Thus, it reduces the total
computational cost while also improving numerical stability. We provide two
instantiations by reformulating variants of SM objectives into the FD forms.
Empirically, we demonstrate that our methods produce results comparable to the
gradient-based counterparts while being much more computationally efficient.
Related papers
- Sample-efficient Bayesian Optimisation Using Known Invariances [56.34916328814857]
We show that vanilla and constrained BO algorithms are inefficient when optimising invariant objectives.
We derive a bound on the maximum information gain of these invariant kernels.
We use our method to design a current drive system for a nuclear fusion reactor, finding a high-performance solution.
arXiv Detail & Related papers (2024-10-22T12:51:46Z) - Efficient Training of Neural Stochastic Differential Equations by Matching Finite Dimensional Distributions [3.889230974713832]
We develop a novel scoring rule for comparing continuous Markov processes.
This scoring rule allows us to bypass the computational overhead associated with signature kernels.
We demonstrate that FDM achieves superior performance, consistently outperforming existing methods in terms of both computational efficiency and generative quality.
arXiv Detail & Related papers (2024-10-04T23:26:38Z) - DOF: Accelerating High-order Differential Operators with Forward
Propagation [40.71528485918067]
We propose an efficient framework, Differential Operator with Forward-propagation (DOF), for calculating general second-order differential operators without losing any precision.
We demonstrate two times improvement in efficiency and reduced memory consumption on any architectures.
Empirical results illustrate that our method surpasses traditional automatic differentiation (AutoDiff) techniques, achieving 2x improvement on the structure and nearly 20x improvement on the sparsity.
arXiv Detail & Related papers (2024-02-15T05:59:21Z) - Efficient and Sound Differentiable Programming in a Functional
Array-Processing Language [4.1779847272994495]
Automatic differentiation (AD) is a technique for computing the derivative of a function represented by a program.
We present an AD system for a higher-order functional array-processing language.
In combination, computation with forward-mode AD can be as efficient as reverse mode.
arXiv Detail & Related papers (2022-12-20T14:54:47Z) - Efficient Differentiable Simulation of Articulated Bodies [89.64118042429287]
We present a method for efficient differentiable simulation of articulated bodies.
This enables integration of articulated body dynamics into deep learning frameworks.
We show that reinforcement learning with articulated systems can be accelerated using gradients provided by our method.
arXiv Detail & Related papers (2021-09-16T04:48:13Z) - Robust Regression via Model Based Methods [13.300549123177705]
We propose an algorithm inspired by so-called model-based optimization (MBO) [35, 36], which replaces a non-objective with a convex model function.
We apply this to robust regression, proposing SADM, a function of the Online Alternating Direction Method of Multipliers (OOADM) [50] to solve the inner optimization in MBO.
Finally, we demonstrate experimentally (a) the robustness of l_p norms to outliers and (b) the efficiency of our proposed model-based algorithms in comparison with methods on autoencoders and multi-target regression.
arXiv Detail & Related papers (2021-06-20T21:45:35Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Self Normalizing Flows [65.73510214694987]
We propose a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer.
This reduces the computational complexity of each layer's exact update from $mathcalO(D3)$ to $mathcalO(D2)$.
We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts.
arXiv Detail & Related papers (2020-11-14T09:51:51Z) - Randomized Automatic Differentiation [22.95414996614006]
We develop a general framework and approach for randomized automatic differentiation (RAD)
RAD can allow unbiased estimates to be computed with reduced memory in return for variance.
We show that RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks.
arXiv Detail & Related papers (2020-07-20T19:03:44Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.