Related papers: Representation Learning in Continuous-Time Score-Based Generative Models

Representation Learning in Continuous-Time Score-Based Generative Models

URL: http://arxiv.org/abs/2105.14257v1
Date: Sat, 29 May 2021 09:26:02 GMT
Title: Representation Learning in Continuous-Time Score-Based Generative Models
Authors: Korbinian Abstreiter, Stefan Bauer, Arash Mehrjou
Abstract summary: Training such models relies on denoising score matching, which can be seen as multi-scale denoising autoencoders. Here, we augment the denoising score-matching framework to enable representation learning without any supervised signal. In contrast, score-based representation learning relies on a new formulation of the denoising score-matching objective.
Score: 19.990583896271573
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Score-based methods represented as stochastic differential equations on a continuous time domain have recently proven successful as a non-adversarial generative model. Training such models relies on denoising score matching, which can be seen as multi-scale denoising autoencoders. Here, we augment the denoising score-matching framework to enable representation learning without any supervised signal. GANs and VAEs learn representations by directly transforming latent codes to data samples. In contrast, score-based representation learning relies on a new formulation of the denoising score-matching objective and thus encodes information needed for denoising. We show how this difference allows for manual control of the level of detail encoded in the representation.

Related papers

Automated Learning of Semantic Embedding Representations for Diffusion Models [1.688134675717698]
We employ a multi-level denoising autoencoder framework to expand the representation capacity of denoising diffusion models.<n>Our work justifies that DDMs are not only suitable for generative tasks, but also potentially advantageous for general-purpose deep learning applications.
arXiv Detail & Related papers (2025-05-09T02:10:46Z)
Video Summarization using Denoising Diffusion Probabilistic Model [21.4190413531697]
We introduce a generative framework for video summarization that learns how to generate summaries from a probability distribution perspective. Specifically, we propose a novel diffusion summarization method based on the Denoising Diffusion Probabilistic Model (DDPM), which learns the probability distribution of training data through noise prediction. Our method is more resistant to subjective annotation noise, and is less prone to overfitting the training data than discriminative methods, with strong generalization ability.
arXiv Detail & Related papers (2024-12-11T13:02:09Z)
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [72.48325960659822]
One main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations. We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders. The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs.
arXiv Detail & Related papers (2024-10-09T14:34:53Z)
Sample what you cant compress [6.24979299238534]
We show how to learn a continuous encoder and decoder under a diffusion-based loss. This approach yields better reconstruction quality as compared to GAN-based autoencoders. We also show that the resulting representation is easier to model with a latent diffusion model as compared to the representation obtained from a state-of-the-art GAN-based loss.
arXiv Detail & Related papers (2024-09-04T08:42:42Z)
Denoising Autoregressive Representation Learning [13.185567468951628]
Our method, DARL, employs a decoder-only Transformer to predict image patches autoregressively. We show that the learned representation can be improved by using tailored noise schedules and longer training in larger models.
arXiv Detail & Related papers (2024-03-08T10:19:00Z)
Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation [24.436957604430678]
We develop a neural network architecture which, trained in an unsupervised manner as a denoising diffusion model, simultaneously learns to both generate and segment images. Experiments demonstrate that our model achieves accurate unsupervised image segmentation and high-quality synthetic image generation across multiple datasets.
arXiv Detail & Related papers (2023-09-27T15:32:46Z)
Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models. While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE) We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z)
Denoising Diffusion Autoencoders are Unified Self-supervised Learners [58.194184241363175]
This paper shows that the networks in diffusion models, namely denoising diffusion autoencoders (DDAE), are unified self-supervised learners. DDAE has already learned strongly linear-separable representations within its intermediate layers without auxiliary encoders. Our diffusion-based approach achieves 95.9% and 50.0% linear evaluation accuracies on CIFAR-10 and Tiny-ImageNet.
arXiv Detail & Related papers (2023-03-17T04:20:47Z)
Deep Semantic Statistics Matching (D2SM) Denoising Network [70.01091467628068]
We introduce the Deep Semantic Statistics Matching (D2SM) Denoising Network. It exploits semantic features of pretrained classification networks, then it implicitly matches the probabilistic distribution of clear images at the semantic feature space. By learning to preserve the semantic distribution of denoised images, we empirically find our method significantly improves the denoising capabilities of networks.
arXiv Detail & Related papers (2022-07-19T14:35:42Z)
Semantic Image Synthesis via Diffusion Models [174.24523061460704]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks. Recent work on semantic image synthesis mainly follows the de facto GAN-based approaches. We propose a novel framework based on DDPM for semantic image synthesis.
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
Composable Augmentation Encoding for Video Representation Learning [94.2358972764708]
We focus on contrastive methods for self-supervised video representation learning. A common paradigm in contrastive learning is to construct positive pairs by sampling different data views for the same instance, with different data instances as negatives. We propose an 'augmentation aware' contrastive learning framework, where we explicitly provide a sequence of augmentation parameterisations. We show that our method encodes valuable information about specified spatial or temporal augmentation, and in doing so also achieve state-of-the-art performance on a number of video benchmarks.
arXiv Detail & Related papers (2021-04-01T16:48:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.