Related papers: Efficient Learning of Generative Models via Finite-Difference Score Matching

Efficient Learning of Generative Models via Finite-Difference Score Matching

URL: http://arxiv.org/abs/2007.03317v2
Date: Wed, 25 Nov 2020 16:10:07 GMT
Title: Efficient Learning of Generative Models via Finite-Difference Score Matching
Authors: Tianyu Pang, Kun Xu, Chongxuan Li, Yang Song, Stefano Ermon, Jun Zhu
Abstract summary: We present a generic strategy to efficiently approximate any-order directional derivative with finite difference. Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
Score: 111.55998083406134
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Several machine learning applications involve the optimization of higher-order derivatives (e.g., gradients of gradients) during training, which can be expensive in respect to memory and computation even with automatic differentiation. As a typical example in generative modeling, score matching (SM) involves the optimization of the trace of a Hessian. To improve computing efficiency, we rewrite the SM objective and its variants in terms of directional derivatives, and present a generic strategy to efficiently approximate any-order directional derivative with finite difference (FD). Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations. Thus, it reduces the total computational cost while also improving numerical stability. We provide two instantiations by reformulating variants of SM objectives into the FD forms. Empirically, we demonstrate that our methods produce results comparable to the gradient-based counterparts while being much more computationally efficient.

Related papers

A Trainable Optimizer [18.195022468462753]
We present a framework that jointly trains the full gradient estimator and the trainable weights of the model.<n>Pseudo-linear TO incurs negligible computational overhead, requiring only minimal additional multiplications.<n> Experiments demonstrate that TO methods converge faster than benchmark algorithms.
arXiv Detail & Related papers (2025-08-03T14:06:07Z)
An Enhanced Zeroth-Order Stochastic Frank-Wolfe Framework for Constrained Finite-Sum Optimization [15.652261277429968]
We propose an enhanced zeroth-order convex computation Frank-Wolfe to address constrained finite-sum optimization problems. Our method introduces a novel double variance reduction framework that effectively reduces the approximation induced by zeroth-order oracles.
arXiv Detail & Related papers (2025-01-13T10:53:19Z)
Sample-efficient Bayesian Optimisation Using Known Invariances [56.34916328814857]
We show that vanilla and constrained BO algorithms are inefficient when optimising invariant objectives. We derive a bound on the maximum information gain of these invariant kernels. We use our method to design a current drive system for a nuclear fusion reactor, finding a high-performance solution.
arXiv Detail & Related papers (2024-10-22T12:51:46Z)
Efficient Training of Neural Stochastic Differential Equations by Matching Finite Dimensional Distributions [3.889230974713832]
We develop a novel scoring rule for comparing continuous Markov processes. This scoring rule allows us to bypass the computational overhead associated with signature kernels. We demonstrate that FDM achieves superior performance, consistently outperforming existing methods in terms of both computational efficiency and generative quality.
arXiv Detail & Related papers (2024-10-04T23:26:38Z)
DOF: Accelerating High-order Differential Operators with Forward Propagation [40.71528485918067]
We propose an efficient framework, Differential Operator with Forward-propagation (DOF), for calculating general second-order differential operators without losing any precision. We demonstrate two times improvement in efficiency and reduced memory consumption on any architectures. Empirical results illustrate that our method surpasses traditional automatic differentiation (AutoDiff) techniques, achieving 2x improvement on the structure and nearly 20x improvement on the sparsity.
arXiv Detail & Related papers (2024-02-15T05:59:21Z)
Efficient and Sound Differentiable Programming in a Functional Array-Processing Language [4.1779847272994495]
Automatic differentiation (AD) is a technique for computing the derivative of a function represented by a program. We present an AD system for a higher-order functional array-processing language. In combination, computation with forward-mode AD can be as efficient as reverse mode.
arXiv Detail & Related papers (2022-12-20T14:54:47Z)
Efficient Differentiable Simulation of Articulated Bodies [89.64118042429287]
We present a method for efficient differentiable simulation of articulated bodies. This enables integration of articulated body dynamics into deep learning frameworks. We show that reinforcement learning with articulated systems can be accelerated using gradients provided by our method.
arXiv Detail & Related papers (2021-09-16T04:48:13Z)
Robust Regression via Model Based Methods [13.300549123177705]
We propose an algorithm inspired by so-called model-based optimization (MBO) [35, 36], which replaces a non-objective with a convex model function. We apply this to robust regression, proposing SADM, a function of the Online Alternating Direction Method of Multipliers (OOADM) [50] to solve the inner optimization in MBO. Finally, we demonstrate experimentally (a) the robustness of l_p norms to outliers and (b) the efficiency of our proposed model-based algorithms in comparison with methods on autoencoders and multi-target regression.
arXiv Detail & Related papers (2021-06-20T21:45:35Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Self Normalizing Flows [65.73510214694987]
We propose a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $mathcalO(D3)$ to $mathcalO(D2)$. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts.
arXiv Detail & Related papers (2020-11-14T09:51:51Z)
Randomized Automatic Differentiation [22.95414996614006]
We develop a general framework and approach for randomized automatic differentiation (RAD) RAD can allow unbiased estimates to be computed with reduced memory in return for variance. We show that RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks.
arXiv Detail & Related papers (2020-07-20T19:03:44Z)
Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem. We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent. Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.