T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified
Visual Modalities
- URL: http://arxiv.org/abs/2305.14674v1
- Date: Wed, 24 May 2023 03:32:03 GMT
- Title: T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified
Visual Modalities
- Authors: Kangfu Mei and Mo Zhou and Vishal M. Patel
- Abstract summary: Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces.
We propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning.
The model can be scaled to generate high-resolution data while unifying multiple modalities.
- Score: 69.16656086708291
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion Probabilistic Field (DPF) models the distribution of continuous
functions defined over metric spaces. While DPF shows great potential for
unifying data generation of various modalities including images, videos, and 3D
geometry, it does not scale to a higher data resolution. This can be attributed
to the ``scaling property'', where it is difficult for the model to capture
local structures through uniform sampling. To this end, we propose a new model
comprising of a view-wise sampling algorithm to focus on local structure
learning, and incorporating additional guidance, e.g., text description, to
complement the global geometry. The model can be scaled to generate
high-resolution data while unifying multiple modalities. Experimental results
on data generation in various modalities demonstrate the effectiveness of our
model, as well as its potential as a foundation framework for scalable
modality-unified visual content generation.
Related papers
- Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds [6.69660410213287]
We propose an innovative framework called Point-MGE to explore the benefits of deeply integrating 3D representation learning and generative learning.
In shape classification, Point-MGE achieved an accuracy of 94.2% (+1.0%) on the ModelNet40 dataset and 92.9% (+5.5%) on the ScanObjectNN dataset.
Experimental results also confirmed that Point-MGE can generate high-quality 3D shapes in both unconditional and conditional settings.
arXiv Detail & Related papers (2024-06-25T07:57:03Z) - GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image [94.56927147492738]
We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes from single images.
We show that leveraging diffusion priors can markedly improve generalization, detail preservation, and efficiency in resource usage.
We propose a simple yet effective strategy to segregate the complex data distribution of various scenes into distinct sub-distributions.
arXiv Detail & Related papers (2024-03-18T17:50:41Z) - Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.
DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z) - Learning transformer-based heterogeneously salient graph representation for multimodal remote sensing image classification [42.15709954199397]
A transformer-based heterogeneously salient graph representation (THSGR) approach is proposed in this paper.
First, a multimodal heterogeneous graph encoder is presented to encode distinctively non-Euclidean structural features from heterogeneous data.
A self-attention-free multi-convolutional modulator is designed for effective and efficient long-term dependency modeling.
arXiv Detail & Related papers (2023-11-17T04:06:20Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Learning Structured Output Representations from Attributes using Deep
Conditional Generative Models [0.0]
This paper recreates the Conditional Variational Auto-encoder architecture and trains it on images conditioned on attributes.
We attempt to generate new faces with distinct attributes such as hair color and glasses, as well as different bird species samples.
arXiv Detail & Related papers (2023-04-30T17:25:31Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - Diffusion Probabilistic Fields [42.428882785136295]
We introduce Diffusion Probabilistic Fields (DPF), a diffusion model that can learn distributions over continuous functions defined over metric spaces.
We empirically show that DPF effectively deals with different modalities like 2D images and 3D geometry, in addition to modeling distributions over fields defined on non-Euclidean metric spaces.
arXiv Detail & Related papers (2023-03-01T01:37:24Z) - Relational VAE: A Continuous Latent Variable Model for Graph Structured
Data [0.0]
We show applications on the problem of structured probability density modeling for simulated and real wind farm monitoring data.
We release the source code, along with the simulated datasets.
arXiv Detail & Related papers (2021-06-30T13:24:27Z) - Manifold Topology Divergence: a Framework for Comparing Data Manifolds [109.0784952256104]
We develop a framework for comparing data manifold, aimed at the evaluation of deep generative models.
Based on the Cross-Barcode, we introduce the Manifold Topology Divergence score (MTop-Divergence)
We demonstrate that the MTop-Divergence accurately detects various degrees of mode-dropping, intra-mode collapse, mode invention, and image disturbance.
arXiv Detail & Related papers (2021-06-08T00:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.