Related papers: On the Interpolation Effect of Score Smoothing in Diffusion Models

Related papers

Smoothing the Score Function for Generalization in Diffusion Models: An Optimization-based Explanation Framework [18.032864089341327]
Diffusion models achieve remarkable generation quality, yet face a fundamental challenge known as memorization.<n>We develop a theoretical framework to explain this phenomenon by showing that the empirical score function is a weighted sum of the score functions of Gaussian distributions.<n>In practice, approximating the empirical score function with a neural network can partially alleviate this issue and improve generalization.
arXiv Detail & Related papers (2026-01-27T07:16:44Z)
On the Theory of Continual Learning with Gradient Descent for Neural Networks [30.678616374316736]
We study the limitations of continual learning in a tractable yet representative setting.<n>Our results reveal interesting phenomena on the role of different problem parameters in the rate of forgetting.
arXiv Detail & Related papers (2025-10-07T04:32:27Z)
Multimodal Atmospheric Super-Resolution With Deep Generative Models [1.9367648935513015]
Score-based diffusion modeling is a generative machine learning algorithm that can be used to sample from complex distributions.<n>In this article, we apply such a concept to the super-resolution of a high-dimensional dynamical system, given the real-time availability of low-resolution and experimentally observed sparse sensor measurements.
arXiv Detail & Related papers (2025-06-28T06:47:09Z)
The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions [51.68215326304272]
We show that even small perturbations reliably cause otherwise identical training trajectories to diverge-an effect that diminishes rapidly over training time.<n>Our findings provide insights into neural network training stability, with practical implications for fine-tuning, model merging, and diversity of model ensembles.
arXiv Detail & Related papers (2025-06-16T08:35:16Z)
Dimension-free Score Matching and Time Bootstrapping for Diffusion Models [11.743167854433306]
Diffusion models generate samples by estimating the score function of the target distribution at various noise levels.<n>In this work, we establish the first (nearly) dimension-free sample bounds complexity for learning these score functions.<n>A key aspect of our analysis is the use of a single function approximator to jointly estimate scores across noise levels.
arXiv Detail & Related papers (2025-02-14T18:32:22Z)
Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods. Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions. We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z)
Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization [12.812942188697326]
Diffusion models have emerged as a powerful tool rivaling GANs in generating high-quality samples with improved fidelity, flexibility, and robustness. A key component of these models is to learn the score function through score matching. Despite empirical success on various tasks, it remains unclear whether gradient-based algorithms can learn the score function with a provable accuracy.
arXiv Detail & Related papers (2024-01-28T08:13:56Z)
Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction. We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z)
Learning invariant representations of time-homogeneous stochastic dynamical systems [27.127773672738535]
We study the problem of learning a representation of the state that faithfully captures its dynamics. This is instrumental to learning the transfer operator or the generator of the system. We show that the search for a good representation can be cast as an optimization problem over neural networks.
arXiv Detail & Related papers (2023-07-19T11:32:24Z)
Seismic Data Interpolation via Denoising Diffusion Implicit Models with Coherence-corrected Resampling [7.755439545030289]
Deep learning models such as U-Net often underperform when the training and test missing patterns do not match. We propose a novel framework that is built upon the multi-modal diffusion models. Inference phase, we introduce the denoising diffusion implicit model to reduce the number of sampling steps. To enhance the coherence and continuity between the revealed traces and the missing traces, we propose two strategies.
arXiv Detail & Related papers (2023-07-09T16:37:47Z)
Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles [34.32021888691789]
We develop a theory of feature-bagging in noisy least-squares ridge ensembles. We demonstrate that subsampling shifts the double-descent peak of a linear predictor. We compare the performance of a feature-subsampling ensemble to a single linear predictor.
arXiv Detail & Related papers (2023-07-06T17:56:06Z)
Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data. Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z)
Deep Double Descent via Smooth Interpolation [2.141079906482723]
We quantify sharpness of fit of training data by studying the loss landscape w.r.t. to the input variable locally to each training point. Our findings show that loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy targets. While small interpolating models sharply fit both clean and noisy data, large interpolating models express a smooth loss landscape, in contrast to existing intuition.
arXiv Detail & Related papers (2022-09-21T02:46:13Z)
CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance. In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error. We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z)
Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process. Our method significantly reduces the required number of interactions compared with random intervention targeting. We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z)
Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal. In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective. Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z)
Model discovery in the sparse sampling regime [0.0]
We show how deep learning can improve model discovery of partial differential equations. As a result, deep learning-based model discovery allows to recover the underlying equations. We illustrate our claims on both synthetic and experimental sets.
arXiv Detail & Related papers (2021-05-02T06:27:05Z)
Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated. We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z)
Automatic Recall Machines: Internal Replay, Continual Learning and the Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity. We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective. Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.