StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models
- URL: http://arxiv.org/abs/2403.04965v2
- Date: Sun, 2 Jun 2024 14:31:09 GMT
- Title: StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models
- Authors: Lezhong Wang, Jeppe Revall Frisvad, Mark Bo Jensen, Siavash Arjomand Bigdeli,
- Abstract summary: We introduce StereoDiffusion, a method that is trainning free, remarkably straightforward to use, and seamlessly integrates into the original Stable Diffusion model.
Our method modifies the latent variable to provide an end-to-end, lightweight capability for fast generation of stereo image pairs.
Our proposed method maintains a high standard of image quality throughout the stereo generation process, achieving state-of-the-art scores in various quantitative evaluations.
- Score: 2.9260206957981167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The demand for stereo images increases as manufacturers launch more XR devices. To meet this demand, we introduce StereoDiffusion, a method that, unlike traditional inpainting pipelines, is trainning free, remarkably straightforward to use, and it seamlessly integrates into the original Stable Diffusion model. Our method modifies the latent variable to provide an end-to-end, lightweight capability for fast generation of stereo image pairs, without the need for fine-tuning model weights or any post-processing of images. Using the original input to generate a left image and estimate a disparity map for it, we generate the latent vector for the right image through Stereo Pixel Shift operations, complemented by Symmetric Pixel Shift Masking Denoise and Self-Attention Layers Modification methods to align the right-side image with the left-side image. Moreover, our proposed method maintains a high standard of image quality throughout the stereo generation process, achieving state-of-the-art scores in various quantitative evaluations.
Related papers
- Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method [60.88467353578118]
We show that a fixed-point-inspired iterative approach to invert real-world images does not achieve convergence, instead oscillating between distinct clusters.
We introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing.
arXiv Detail & Related papers (2024-11-17T17:45:37Z) - MaDis-Stereo: Enhanced Stereo Matching via Distilled Masked Image Modeling [18.02254687807291]
Transformer-based stereo models have been studied recently, their performance still lags behind CNN-based stereo models due to the inherent data scarcity issue in the stereo matching task.
We propose Masked Image Modeling Distilled Stereo matching model, termed MaDis-Stereo, that enhances locality inductive bias by leveraging Masked Image Modeling (MIM) in training Transformer-based stereo model.
arXiv Detail & Related papers (2024-09-04T16:17:45Z) - Content-aware Masked Image Modeling Transformer for Stereo Image Compression [15.819672238043786]
We propose a stereo image compression framework, named CAMSIC.
CAMSIC transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model.
Experiments show that our framework achieves state-of-the-art rate-distortion performance on two stereo image datasets.
arXiv Detail & Related papers (2024-03-13T13:12:57Z) - Image Inpainting via Tractable Steering of Diffusion Models [54.13818673257381]
This paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior.
Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs)
We show that our approach can consistently improve the overall quality and semantic coherence of inpainted images with only 10% additional computational overhead.
arXiv Detail & Related papers (2023-11-28T21:14:02Z) - FreePIH: Training-Free Painterly Image Harmonization with Diffusion
Model [19.170302996189335]
Our FreePIH method tames the denoising process as a plug-in module for foreground image style transfer.
We make use of multi-scale features to enforce the consistency of the content and stability of the foreground objects in the latent space.
Our method can surpass representative baselines by large margins.
arXiv Detail & Related papers (2023-11-25T04:23:49Z) - Gradpaint: Gradient-Guided Inpainting with Diffusion Models [71.47496445507862]
Denoising Diffusion Probabilistic Models (DDPMs) have recently achieved remarkable results in conditional and unconditional image generation.
We present GradPaint, which steers the generation towards a globally coherent image.
We generalizes well to diffusion models trained on various datasets, improving upon current state-of-the-art supervised and unsupervised methods.
arXiv Detail & Related papers (2023-09-18T09:36:24Z) - Real-World Image Variation by Aligning Diffusion Inversion Chain [53.772004619296794]
A domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images.
We propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL)
Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.
arXiv Detail & Related papers (2023-05-30T04:09:47Z) - MaskSketch: Unpaired Structure-guided Masked Image Generation [56.88038469743742]
MaskSketch is an image generation method that allows spatial conditioning of the generation result using a guiding sketch as an extra conditioning signal during sampling.
We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image.
Our results show that MaskSketch achieves high image realism and fidelity to the guiding structure.
arXiv Detail & Related papers (2023-02-10T20:27:02Z) - Deep Uncalibrated Photometric Stereo via Inter-Intra Image Feature
Fusion [17.686973510425172]
This paper presents a new method for deep uncalibrated photometric stereo.
It efficiently utilizes the inter-image representation to guide the normal estimation.
Our method produces significantly better results than the state-of-the-art methods on both synthetic and real data.
arXiv Detail & Related papers (2022-08-06T03:59:54Z) - Recursive Self-Improvement for Camera Image and Signal Processing
Pipeline [6.318974730864278]
Current camera image and signal processing pipelines (ISPs) tend to apply a single filter that is uniformly applied to the entire image.
This despite the fact that most acquired camera images have spatially heterogeneous artifacts.
We present a deep reinforcement learning model that works in learned latent subspaces.
arXiv Detail & Related papers (2021-11-15T02:23:40Z) - Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo [103.08512487830669]
We present a modern solution to the multi-view photometric stereo problem (MVPS)
We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry.
Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network.
arXiv Detail & Related papers (2021-10-11T20:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.