Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code
Diffusion using Transformers
- URL: http://arxiv.org/abs/2308.14152v1
- Date: Sun, 27 Aug 2023 16:22:09 GMT
- Title: Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code
Diffusion using Transformers
- Authors: Abril Corona-Figueroa, Sam Bond-Taylor, Neelanjan Bhowmik, Yona
Falinie A. Gaus, Toby P. Breckon, Hubert P. H. Shum, Chris G. Willcocks
- Abstract summary: We propose a simple and novel 2D to 3D synthesis approach based on conditional diffusion with vector-quantized codes.
operating in an information-rich code space enables high-resolution 3D synthesis via full-coverage attention across the views.
- Score: 26.500355873271634
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating 3D images of complex objects conditionally from a few 2D views is
a difficult synthesis problem, compounded by issues such as domain gap and
geometric misalignment. For instance, a unified framework such as Generative
Adversarial Networks cannot achieve this unless they explicitly define both a
domain-invariant and geometric-invariant joint latent distribution, whereas
Neural Radiance Fields are generally unable to handle both issues as they
optimize at the pixel level. By contrast, we propose a simple and novel 2D to
3D synthesis approach based on conditional diffusion with vector-quantized
codes. Operating in an information-rich code space enables high-resolution 3D
synthesis via full-coverage attention across the views. Specifically, we
generate the 3D codes (e.g. for CT images) conditional on previously generated
3D codes and the entire codebook of two 2D views (e.g. 2D X-rays). Qualitative
and quantitative results demonstrate state-of-the-art performance over
specialized methods across varied evaluation criteria, including fidelity
metrics such as density, coverage, and distortion metrics for two complex
volumetric imagery datasets from in real-world scenarios.
Related papers
- VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided Texturing [22.39760469467524]
We propose a Variance texture synthesis to address the modal gap between the 2D and 3D diffusion models.
We present an inpainting module to improve details with conflicting regions.
arXiv Detail & Related papers (2024-07-05T12:11:33Z) - Repeat and Concatenate: 2D to 3D Image Translation with 3D to 3D Generative Modeling [14.341099905684844]
This paper investigates a 2D to 3D image translation method with a straightforward technique, enabling correlated 2D X-ray to 3D CT-like reconstruction.
We observe that existing approaches, which integrate information across multiple 2D views in the latent space lose valuable signal information during latent encoding. Instead, we simply repeat and the 2D views into higher-channel 3D volumes and approach the 3D reconstruction challenge as a straightforward 3D to 3D generative modeling problem.
This method enables the reconstructed 3D volume to retain valuable information from the 2D inputs, which are passed between channel states in a Swin U
arXiv Detail & Related papers (2024-06-26T15:18:20Z) - NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized
Device Coordinates Space [77.6067460464962]
Monocular 3D Semantic Scene Completion (SSC) has garnered significant attention in recent years due to its potential to predict complex semantics and geometry shapes from a single image, requiring no 3D inputs.
We identify several critical issues in current state-of-the-art methods, including the Feature Ambiguity of projected 2D features in the ray to the 3D space, the Pose Ambiguity of the 3D convolution, and the Imbalance in the 3D convolution across different depth levels.
We devise a novel Normalized Device Coordinates scene completion network (NDC-Scene) that directly extends the 2
arXiv Detail & Related papers (2023-09-26T02:09:52Z) - Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion [115.82306502822412]
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing.
A corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing.
We study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures.
arXiv Detail & Related papers (2022-12-14T18:49:50Z) - Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator [68.0533826852601]
3D-aware image synthesis aims at learning a generative model that can render photo-realistic 2D images while capturing decent underlying 3D shapes.
Existing methods fail to obtain moderate 3D shapes.
We propose a geometry-aware discriminator to improve 3D-aware GANs.
arXiv Detail & Related papers (2022-09-30T17:59:37Z) - 3D-Aware Indoor Scene Synthesis with Depth Priors [62.82867334012399]
Existing methods fail to model indoor scenes due to the large diversity of room layouts and the objects inside.
We argue that indoor scenes do not have a shared intrinsic structure, and hence only using 2D images cannot adequately guide the model with the 3D geometry.
arXiv Detail & Related papers (2022-02-17T09:54:29Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - Bidirectional Projection Network for Cross Dimension Scene Understanding [69.29443390126805]
We present a emphbidirectional projection network (BPNet) for joint 2D and 3D reasoning in an end-to-end manner.
Via the emphBPM, complementary 2D and 3D information can interact with each other in multiple architectural levels.
Our emphBPNet achieves top performance on the ScanNetV2 benchmark for both 2D and 3D semantic segmentation.
arXiv Detail & Related papers (2021-03-26T08:31:39Z) - Generalizing Spatial Transformers to Projective Geometry with
Applications to 2D/3D Registration [11.219924013808852]
Differentiable rendering is a technique to connect 3D scenes with corresponding 2D images.
We propose a novel Projective Spatial Transformer module that generalizes spatial transformers to projective geometry.
arXiv Detail & Related papers (2020-03-24T17:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.