Information Newton's flow: second-order optimization method in
probability space
- URL: http://arxiv.org/abs/2001.04341v4
- Date: Wed, 5 Aug 2020 01:28:44 GMT
- Title: Information Newton's flow: second-order optimization method in
probability space
- Authors: Yifei Wang and Wuchen Li
- Abstract summary: We introduce a framework for Newton's flows in probability space with information metrics, named information Newton's flows.
A known fact is that overdamped Langevin dynamics correspond to Wasserstein gradient flows of Kullback-Leibler divergence.
We provide examples of Newton's Langevin dynamics in both one-dimensional space and Gaussian families.
- Score: 10.340665633567083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a framework for Newton's flows in probability space with
information metrics, named information Newton's flows. Here two information
metrics are considered, including both the Fisher-Rao metric and the
Wasserstein-2 metric. A known fact is that overdamped Langevin dynamics
correspond to Wasserstein gradient flows of Kullback-Leibler (KL) divergence.
Extending this fact to Wasserstein Newton's flows, we derive Newton's Langevin
dynamics. We provide examples of Newton's Langevin dynamics in both
one-dimensional space and Gaussian families. For the numerical implementation,
we design sampling efficient variational methods in affine models and
reproducing kernel Hilbert space (RKHS) to approximate Wasserstein Newton's
directions. We also establish convergence results of the proposed information
Newton's method with approximated directions. Several numerical examples from
Bayesian sampling problems are shown to demonstrate the effectiveness of the
proposed method.
Related papers
- FedNS: A Fast Sketching Newton-Type Algorithm for Federated Learning [27.957498393822338]
We introduce a novel approach to tackle this issue while still achieving fast convergence rates.
Our proposed method, named as Federated Newton Sketch methods (FedNS), approximates the centralized Newton's method by communicating the sketched square-root Hessian instead of the exact Hessian.
We provide convergence analysis based on statistical learning for the federated Newton sketch approaches.
arXiv Detail & Related papers (2024-01-05T10:06:41Z) - Backtracking New Q-Newton's method, Newton's flow, Voronoi's diagram and
Stochastic root finding [0.0]
A new variant of Newton's method - named Backtracking New Q-Newton's method (BNQN) - has strong theoretical guarantee, is easy to implement, and has good experimental performance.
arXiv Detail & Related papers (2024-01-02T15:37:47Z) - Augmented Newton Method for Optimization: Global Linear Rate and
Momentum Interpretation [0.0]
We propose two variants of Newton method for solving unconstrained problem.
We generate novel variants of the Newton method namely the Penalty Newton method and the Augmented Newton method.
arXiv Detail & Related papers (2022-05-23T04:34:10Z) - Inertial Newton Algorithms Avoiding Strict Saddle Points [0.7614628596146602]
We study the behavior of second-order algorithms mixing Newton's method and inertial gradient descent.
We show that, Newtonian behavior of these methods, they almost always escape strict points.
arXiv Detail & Related papers (2021-11-08T16:02:45Z) - Newton-LESS: Sparsification without Trade-offs for the Sketched Newton
Update [88.73437209862891]
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration.
We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching.
We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
arXiv Detail & Related papers (2021-07-15T17:33:05Z) - Large-Scale Wasserstein Gradient Flows [84.73670288608025]
We introduce a scalable scheme to approximate Wasserstein gradient flows.
Our approach relies on input neural networks (ICNNs) to discretize the JKO steps.
As a result, we can sample from the measure at each step of the gradient diffusion and compute its density.
arXiv Detail & Related papers (2021-06-01T19:21:48Z) - Discriminative Bayesian filtering lends momentum to the stochastic
Newton method for minimizing log-convex functions [0.0]
We show how the Newton method iteratively updates its estimate using subsampled versions of gradient and Hessian versions.
Applying Bayesian filtering, we consider the entire history of observations.
We establish matrix-based conditions under which the effect of older observations diminishes.
We illustrate various aspects of our approach with an example and other innovations for the Newton method.
arXiv Detail & Related papers (2021-04-27T02:39:21Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z) - Learning High Dimensional Wasserstein Geodesics [55.086626708837635]
We propose a new formulation and learning strategy for computing the Wasserstein geodesic between two probability distributions in high dimensions.
By applying the method of Lagrange multipliers to the dynamic formulation of the optimal transport (OT) problem, we derive a minimax problem whose saddle point is the Wasserstein geodesic.
We then parametrize the functions by deep neural networks and design a sample based bidirectional learning algorithm for training.
arXiv Detail & Related papers (2021-02-05T04:25:28Z) - Learning Unknown Physics of non-Newtonian Fluids [56.9557910899739]
We extend the physics-informed neural network (PINN) method to learn viscosity models of two non-Newtonian systems.
The PINN-inferred viscosity models agree with the empirical models for shear rates with large absolute values but deviate for shear rates near zero.
We use the PINN method to solve the momentum conservation equation for non-Newtonian fluid flow using only the boundary conditions.
arXiv Detail & Related papers (2020-08-26T20:41:36Z) - Disentangling the Gauss-Newton Method and Approximate Inference for
Neural Networks [96.87076679064499]
We disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning.
We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly.
The connection to Gaussian processes enables new function-space inference algorithms.
arXiv Detail & Related papers (2020-07-21T17:42:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.