Related papers: Enhancing Diffusion Models with 3D Perspective Geometry Constraints

Enhancing Diffusion Models with 3D Perspective Geometry Constraints

URL: http://arxiv.org/abs/2312.00944v1
Date: Fri, 1 Dec 2023 21:56:43 GMT
Title: Enhancing Diffusion Models with 3D Perspective Geometry Constraints
Authors: Rishi Upadhyay, Howard Zhang, Yunhao Ba, Ethan Yang, Blake Gella, Sicheng Jiang, Alex Wong, Achuta Kadambi
Abstract summary: We introduce a novel geometric constraint in the training process of generative models to enforce perspective accuracy. We show that outputs of models trained with this constraint both appear more realistic and improve performance of downstream models trained on generated images.
Score: 10.21800236402905
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While perspective is a well-studied topic in art, it is generally taken for granted in images. However, for the recent wave of high-quality image synthesis methods such as latent diffusion models, perspective accuracy is not an explicit requirement. Since these methods are capable of outputting a wide gamut of possible images, it is difficult for these synthesized images to adhere to the principles of linear perspective. We introduce a novel geometric constraint in the training process of generative models to enforce perspective accuracy. We show that outputs of models trained with this constraint both appear more realistic and improve performance of downstream models trained on generated images. Subjective human trials show that images generated with latent diffusion models trained with our constraint are preferred over images from the Stable Diffusion V2 model 70% of the time. SOTA monocular depth estimation models such as DPT and PixelFormer, fine-tuned on our images, outperform the original models trained on real images by up to 7.03% in RMSE and 19.3% in SqRel on the KITTI test set for zero-shot transfer.

Related papers

Geometric Consistency Refinement for Single Image Novel View Synthesis via Test-Time Adaptation of Diffusion Models [11.642407092687177]
We propose a methodology to improve the geometric correctness of images generated by a diffusion model for single image NVS. We formulate a loss function based on image matching and epipolar constraints, and optimize the starting noise in a diffusion sampling process. Our method does not require training data or fine-tuning of the diffusion models.
arXiv Detail & Related papers (2025-04-11T08:28:41Z)
OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs [20.652907645817713]
OFTSR is a flow-based framework for one-step image super-resolution that can produce outputs with tunable levels of fidelity and realism. We demonstrate that OFTSR achieves state-of-the-art performance for one-step image super-resolution, while having the ability to flexibly tune the fidelity-realism trade-off.
arXiv Detail & Related papers (2024-12-12T17:14:58Z)
DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets. Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z)
PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference [62.72779589895124]
We make the first attempt to align diffusion models for image inpainting with human aesthetic standards via a reinforcement learning framework. We train a reward model with a dataset we construct, consisting of nearly 51,000 images annotated with human preferences. Experiments on inpainting comparison and downstream tasks, such as image extension and 3D reconstruction, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-29T11:49:39Z)
Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
Diffusion models have dominated the field of large, generative image models. We propose an algorithm for fast-constrained sampling in large pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-24T14:52:38Z)
Fine Tuning Text-to-Image Diffusion Models for Correcting Anomalous Images [0.0]
This study proposes a method to mitigate such issues by fine-tuning the Stable Diffusion 3 model using the DreamBooth technique. Experimental results targeting the prompt "lying on the grass/street" demonstrate that the fine-tuned model shows improved performance in visual evaluation and metrics such as Structural Similarity Index (SSIM), Peak Signal-to-Noise Ratio (PSNR), and Frechet Inception Distance (FID)
arXiv Detail & Related papers (2024-09-23T00:51:47Z)
YaART: Yet Another ART Rendering Technology [119.09155882164573]
This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences. We analyze how these choices affect both the efficiency of the training process and the quality of the generated images. We demonstrate that models trained on smaller datasets of higher-quality images can successfully compete with those trained on larger datasets.
arXiv Detail & Related papers (2024-04-08T16:51:19Z)
Large-scale Reinforcement Learning for Diffusion Models [30.164571425479824]
Text-to-image diffusion models are susceptible to implicit biases that arise from web-scale text-image training pairs. We present an effective scalable algorithm to improve diffusion models using Reinforcement Learning (RL) We show how our approach substantially outperforms existing methods for aligning diffusion models with human preferences.
arXiv Detail & Related papers (2024-01-20T08:10:43Z)
Conditional Image Generation with Pretrained Generative Model [1.4685355149711303]
diffusion models have gained popularity for their ability to generate higher-quality images in comparison to GAN models. These models require a huge amount of data, computational resources, and meticulous tuning for successful training. We propose methods to leverage pre-trained unconditional diffusion models with additional guidance for the purpose of conditional image generative.
arXiv Detail & Related papers (2023-12-20T18:27:53Z)
Image Inpainting via Tractable Steering of Diffusion Models [54.13818673257381]
This paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior. Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs) We show that our approach can consistently improve the overall quality and semantic coherence of inpainted images with only 10% additional computational overhead.
arXiv Detail & Related papers (2023-11-28T21:14:02Z)
IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues. Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images. For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z)
3D-aware Image Generation using 2D Diffusion Models [23.150456832947427]
We formulate the 3D-aware image generation task as multiview 2D image set generation, and further to a sequential unconditional-conditional multiview image generation process. We utilize 2D diffusion models to boost the generative modeling power of the method. We train our method on a large-scale dataset, i.e., ImageNet, which is not addressed by previous methods.
arXiv Detail & Related papers (2023-03-31T09:03:18Z)
HoloDiffusion: Training a 3D Diffusion Model using 2D Images [71.1144397510333]
We introduce a new diffusion setup that can be trained, end-to-end, with only posed 2D images for supervision. We show that our diffusion models are scalable, train robustly, and are competitive in terms of sample quality and fidelity to existing approaches for 3D generative modeling.
arXiv Detail & Related papers (2023-03-29T07:35:56Z)
Image Completion via Inference in Deep Generative Models [16.99337751292915]
We consider image completion from the perspective of amortized inference in an image generative model. We demonstrate superior sample quality and diversity compared to prior art on the CIFAR-10 and FFHQ-256 datasets.
arXiv Detail & Related papers (2021-02-24T02:59:43Z)
Improved Techniques for Training Score-Based Generative Models [104.20217659157701]
We provide a new theoretical analysis of learning and sampling from score models in high dimensional spaces. We can effortlessly scale score-based generative models to images with unprecedented resolutions. Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets.
arXiv Detail & Related papers (2020-06-16T09:17:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.