A Mean-Field Theory for Learning the Sch\"{o}nberg Measure of Radial
Basis Functions
- URL: http://arxiv.org/abs/2006.13330v2
- Date: Fri, 3 Jul 2020 13:43:31 GMT
- Title: A Mean-Field Theory for Learning the Sch\"{o}nberg Measure of Radial
Basis Functions
- Authors: Masoud Badiei Khuzani, Yinyu Ye, Sandy Napel, Lei Xing
- Abstract summary: We learn the distribution in the Sch"onberg integral representation of the radial basis functions from training samples.
We prove that in the scaling limits, the empirical measure of the Langevin particles converges to the law of a reflected Ito diffusion-drift process.
- Score: 13.503048325896174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop and analyze a projected particle Langevin optimization method to
learn the distribution in the Sch\"{o}nberg integral representation of the
radial basis functions from training samples. More specifically, we
characterize a distributionally robust optimization method with respect to the
Wasserstein distance to optimize the distribution in the Sch\"{o}nberg integral
representation. To provide theoretical performance guarantees, we analyze the
scaling limits of a projected particle online (stochastic) optimization method
in the mean-field regime. In particular, we prove that in the scaling limits,
the empirical measure of the Langevin particles converges to the law of a
reflected It\^{o} diffusion-drift process. Moreover, the drift is also a
function of the law of the underlying process. Using It\^{o} lemma for
semi-martingales and Grisanov's change of measure for the Wiener processes, we
then derive a Mckean-Vlasov type partial differential equation (PDE) with Robin
boundary conditions that describes the evolution of the empirical measure of
the projected Langevin particles in the mean-field regime. In addition, we
establish the existence and uniqueness of the steady-state solutions of the
derived PDE in the weak sense. We apply our learning approach to train radial
kernels in the kernel locally sensitive hash (LSH) functions, where the
training data-set is generated via a $k$-mean clustering method on a small
subset of data-base. We subsequently apply our kernel LSH with a trained kernel
for image retrieval task on MNIST data-set, and demonstrate the efficacy of our
kernel learning approach. We also apply our kernel learning approach in
conjunction with the kernel support vector machines (SVMs) for classification
of benchmark data-sets.
Related papers
- Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling.
We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation [59.45669299295436]
We propose a Monte Carlo PDE solver for training unsupervised neural solvers.
We use the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.
Our experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency.
arXiv Detail & Related papers (2023-02-10T08:05:19Z) - Learning "best" kernels from data in Gaussian process regression. With
application to aerodynamics [0.4588028371034406]
We introduce algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques.
A first class of algorithms is kernel flow, which was introduced in a context of classification in machine learning.
A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal.
arXiv Detail & Related papers (2022-06-03T07:50:54Z) - Sobolev Acceleration and Statistical Optimality for Learning Elliptic
Equations via Gradient Descent [11.483919798541393]
We study the statistical limits in terms of Sobolev norms of gradient descent for solving inverse problem from randomly sampled noisy observations.
Our class of objective functions includes Sobolev training for kernel regression, Deep Ritz Methods (DRM), and Physics Informed Neural Networks (PINN)
arXiv Detail & Related papers (2022-05-15T17:01:53Z) - A blob method method for inhomogeneous diffusion with applications to
multi-agent control and sampling [0.6562256987706128]
We develop a deterministic particle method for the weighted porous medium equation (WPME) and prove its convergence on bounded time intervals.
Our method has natural applications to multi-agent coverage algorithms and sampling probability measures.
arXiv Detail & Related papers (2022-02-25T19:49:05Z) - A Kernel Learning Method for Backward SDE Filter [1.7035011973665108]
We develop a kernel learning backward SDE filter method to propagate the state of a dynamical system based on its partial noisy observations.
We introduce a kernel learning method to learn a continuous global approximation for the conditional probability density function of the target state.
Numerical experiments demonstrate that the kernel learning backward SDE is highly effective and highly efficient.
arXiv Detail & Related papers (2022-01-25T19:49:19Z) - A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense.
We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z) - Fast Estimation of Information Theoretic Learning Descriptors using
Explicit Inner Product Spaces [4.5497405861975935]
Kernel methods form a theoretically-grounded, powerful and versatile framework to solve nonlinear problems in signal processing and machine learning.
Recently, we proposed emphno-trick (NT) kernel adaptive filtering (KAF)
We focus on a family of fast, scalable, and accurate estimators for ITL using explicit inner product space kernels.
arXiv Detail & Related papers (2020-01-01T20:21:12Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.