Related papers: Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization

Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization

URL: http://arxiv.org/abs/2401.15604v3
Date: Wed, 13 Mar 2024 01:25:26 GMT
Title: Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization
Authors: Yinbin Han, Meisam Razaviyayn, Renyuan Xu
Abstract summary: Diffusion models have emerged as a powerful tool rivaling GANs in generating high-quality samples with improved fidelity, flexibility, and robustness. A key component of these models is to learn the score function through score matching. Despite empirical success on various tasks, it remains unclear whether gradient-based algorithms can learn the score function with a provable accuracy.
Score: 12.812942188697326
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models have emerged as a powerful tool rivaling GANs in generating high-quality samples with improved fidelity, flexibility, and robustness. A key component of these models is to learn the score function through score matching. Despite empirical success on various tasks, it remains unclear whether gradient-based algorithms can learn the score function with a provable accuracy. As a first step toward answering this question, this paper establishes a mathematical framework for analyzing score estimation using neural networks trained by gradient descent. Our analysis covers both the optimization and the generalization aspects of the learning procedure. In particular, we propose a parametric form to formulate the denoising score-matching problem as a regression with noisy labels. Compared to the standard supervised learning setup, the score-matching problem introduces distinct challenges, including unbounded input, vector-valued output, and an additional time variable, preventing existing techniques from being applied directly. In this paper, we show that with proper designs, the evolution of neural networks during training can be accurately modeled by a series of kernel regression tasks. Furthermore, by applying an early-stopping rule for gradient descent and leveraging recent developments in neural tangent kernels, we establish the first generalization error (sample complexity) bounds for learning the score function with neural networks, despite the presence of noise in the observations. Our analysis is grounded in a novel parametric form of the neural network and an innovative connection between score matching and regression analysis, facilitating the application of advanced statistical and optimization techniques.

Related papers

Convergence of Stochastic Gradient Methods for Wide Two-Layer Physics-Informed Neural Networks [0.6319731355340598]
In practice, one often employs gradient descent type algorithms to train the neural network.<n>In this work, we establish the linear convergence of gradient descent / flow in training over-ized two layer PINNs for a general class of activation functions in the sense of high probability.
arXiv Detail & Related papers (2025-08-29T12:25:51Z)
From Fourier to Neural ODEs: Flow Matching for Modeling Complex Systems [20.006163951844357]
We propose a simulation-free framework for training neural ordinary differential equations (NODEs) We employ the Fourier analysis to estimate temporal and potential high-order spatial gradients from noisy observational data. Our approach outperforms state-of-the-art methods in terms of training time, dynamics prediction, and robustness.
arXiv Detail & Related papers (2024-05-19T13:15:23Z)
Adaptive Sampling for Deep Learning via Efficient Nonparametric Proxies [35.29595714883275]
We develop an efficient sketch-based approximation to the Nadaraya-Watson estimator. Our sampling algorithm outperforms the baseline in terms of wall-clock time and accuracy on four datasets.
arXiv Detail & Related papers (2023-11-22T18:40:18Z)
The limitation of neural nets for approximation and optimization [0.0]
We are interested in assessing the use of neural networks as surrogate models to approximate and minimize objective functions in optimization problems. Our study begins by determining the best activation function for approximating the objective functions of popular nonlinear optimization test problems.
arXiv Detail & Related papers (2023-11-21T00:21:15Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs) It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Mitigating Performance Saturation in Neural Marked Point Processes: Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers. We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z)
A Distributed Optimisation Framework Combining Natural Gradient with Hessian-Free for Discriminative Sequence Training [16.83036203524611]
This paper presents a novel natural gradient and Hessian-free (NGHF) optimisation framework for neural network training. It relies on the linear conjugate gradient (CG) algorithm to combine the natural gradient (NG) method with local curvature information from Hessian-free (HF) or other second-order methods. Experiments are reported on the multi-genre broadcast data set for a range of different acoustic model types.
arXiv Detail & Related papers (2021-03-12T22:18:34Z)
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases [3.198144010381572]
In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches. It is still unclear why randomly gradient descent algorithms reach their limits.
arXiv Detail & Related papers (2021-02-23T18:17:47Z)
Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents. One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis. We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z)
A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.