Related papers: DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion

DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion

URL: http://arxiv.org/abs/2506.14202v1
Date: Tue, 17 Jun 2025 05:44:18 GMT
Title: DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion
Authors: Makoto Shing, Takuya Akiba,
Abstract summary: Training large neural networks with end-to-end backpropagation creates significant memory bottlenecks.<n>We propose $itDiffusionBlocks$, a novel training framework that interprets neural network blocks as performing denoising operations in a continuous-time diffusion process.
Score: 2.455468619225742
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training large neural networks with end-to-end backpropagation creates significant memory bottlenecks, limiting accessibility to state-of-the-art AI research. We propose $\textit{DiffusionBlocks}$, a novel training framework that interprets neural network blocks as performing denoising operations in a continuous-time diffusion process. By partitioning the network into independently trainable blocks and optimizing noise level assignments based on equal cumulative probability mass, our approach achieves significant memory efficiency while maintaining competitive performance compared to traditional backpropagation in generative tasks. Experiments on image generation and language modeling tasks demonstrate memory reduction proportional to the number of blocks while achieving superior performance. DiffusionBlocks provides a promising pathway for democratizing access to large-scale neural network training with limited computational resources.

Related papers

BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion [56.9358325168226]
We propose a Bagging deep learning training algorithm based on Efficient Neural network Diffusion (BEND) Our approach is simple but effective, first using multiple trained model weights and biases as inputs to train autoencoder and latent diffusion model. Our proposed BEND algorithm can consistently outperform the mean and median accuracies of both the original trained model and the diffused model.
arXiv Detail & Related papers (2024-03-23T08:40:38Z)
An NMF-Based Building Block for Interpretable Neural Networks With Continual Learning [0.8158530638728501]
Existing learning methods often struggle to balance interpretability and predictive performance. Our approach aims to strike a better balance between these two aspects through the use of a building block based on NMF.
arXiv Detail & Related papers (2023-11-20T02:00:33Z)
Lightweight Diffusion Models with Distillation-Based Block Neural Architecture Search [55.41583104734349]
We propose to automatically remove structural redundancy in diffusion models with our proposed Diffusion Distillation-based Block-wise Neural Architecture Search (NAS) Given a larger pretrained teacher, we leverage DiffNAS to search for the smallest architecture which can achieve on-par or even better performance than the teacher. Different from previous block-wise NAS methods, DiffNAS contains a block-wise local search strategy and a retraining strategy with a joint dynamic loss.
arXiv Detail & Related papers (2023-11-08T12:56:59Z)
Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Learning Discrete Weights and Activations Using the Local Reparameterization Trick [21.563618480463067]
In computer vision and machine learning, a crucial challenge is to lower the computation and memory demands for neural network inference. By binarizing the network weights and activations, one can significantly reduce computational complexity. This leads to a more efficient neural network inference that can be deployed on low-resource devices.
arXiv Detail & Related papers (2023-07-04T12:27:10Z)
The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples. In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z)
Latent Iterative Refinement for Modular Source Separation [44.78689915209527]
Traditional source separation approaches train deep neural network models end-to-end with all the data available at once. We argue that we can significantly increase resource efficiency during both training and inference stages.
arXiv Detail & Related papers (2022-11-22T00:02:57Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
FFNB: Forgetting-Free Neural Blocks for Deep Continual Visual Learning [14.924672048447338]
We devise a dynamic network architecture for continual learning based on a novel forgetting-free neural block (FFNB) Training FFNB features on new tasks is achieved using a novel procedure that constrains the underlying parameters in the null-space of the previous tasks.
arXiv Detail & Related papers (2021-11-22T17:23:34Z)
BLOOM-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement [26.39206098000297]
We present a blockwise optimization method for masking-based networks (BLOOM-Net) for training scalable speech enhancement networks. Our experiments on speech enhancement demonstrate that the proposed blockwise optimization method achieves the desired scalability with only a slight performance degradation compared to corresponding models trained end-to-end.
arXiv Detail & Related papers (2021-11-17T20:11:07Z)
Attentive Gaussian processes for probabilistic time-series generation [4.94950858749529]
We propose a computationally efficient attention-based network combined with the Gaussian process regression to generate real-valued sequence. We develop a block-wise training algorithm to allow mini-batch training of the network while the GP is trained using full-batch. The algorithm has been proved to converge and shows comparable, if not better, quality of the found solution.
arXiv Detail & Related papers (2021-02-10T01:19:15Z)
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs) The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.