Generalizing Spatial Transformers to Projective Geometry with
Applications to 2D/3D Registration
- URL: http://arxiv.org/abs/2003.10987v1
- Date: Tue, 24 Mar 2020 17:26:50 GMT
- Title: Generalizing Spatial Transformers to Projective Geometry with
Applications to 2D/3D Registration
- Authors: Cong Gao, Xingtong Liu, Wenhao Gu, Benjamin Killeen, Mehran Armand,
Russell Taylor and Mathias Unberath
- Abstract summary: Differentiable rendering is a technique to connect 3D scenes with corresponding 2D images.
We propose a novel Projective Spatial Transformer module that generalizes spatial transformers to projective geometry.
- Score: 11.219924013808852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentiable rendering is a technique to connect 3D scenes with
corresponding 2D images. Since it is differentiable, processes during image
formation can be learned. Previous approaches to differentiable rendering focus
on mesh-based representations of 3D scenes, which is inappropriate for medical
applications where volumetric, voxelized models are used to represent anatomy.
We propose a novel Projective Spatial Transformer module that generalizes
spatial transformers to projective geometry, thus enabling differentiable
volume rendering. We demonstrate the usefulness of this architecture on the
example of 2D/3D registration between radiographs and CT scans. Specifically,
we show that our transformer enables end-to-end learning of an image processing
and projection model that approximates an image similarity function that is
convex with respect to the pose parameters, and can thus be optimized
effectively using conventional gradient descent. To the best of our knowledge,
this is the first time that spatial transformers have been described for
projective geometry. The source code will be made public upon publication of
this manuscript and we hope that our developments will benefit related 3D
research applications.
Related papers
- GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory.
Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images.
GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z) - TP3M: Transformer-based Pseudo 3D Image Matching with Reference Image [0.9831489366502301]
We propose a Transformer-based pseudo 3D image matching method.
It upgrades the 2D features extracted from the source image to 3D features with the help of a reference image and matches to the 2D features extracted from the destination image.
Experimental results on multiple datasets show that the proposed method achieves the state-of-the-art on the tasks of homography estimation, pose estimation and visual localization.
arXiv Detail & Related papers (2024-05-14T08:56:09Z) - Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors [24.478875248825563]
We propose a novel image editing technique that enables 3D manipulations on single images.
Our method directly leverages powerful image diffusion models trained on a broad spectrum of text-image pairs.
Our method can generate high-quality 3D-aware image edits with large viewpoint transformations and high appearance and shape consistency with the input image.
arXiv Detail & Related papers (2024-03-18T06:18:59Z) - Multiple View Geometry Transformers for 3D Human Pose Estimation [35.26756920323391]
We aim to improve the 3D reasoning ability of Transformers in multi-view 3D human pose estimation.
We propose a novel hybrid model, MVGFormer, which has a series of geometric and appearance modules organized in an iterative manner.
arXiv Detail & Related papers (2023-11-18T06:32:40Z) - Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code
Diffusion using Transformers [26.500355873271634]
We propose a simple and novel 2D to 3D synthesis approach based on conditional diffusion with vector-quantized codes.
operating in an information-rich code space enables high-resolution 3D synthesis via full-coverage attention across the views.
arXiv Detail & Related papers (2023-08-27T16:22:09Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text
Aligned Latent Representation [47.945556996219295]
We present a novel alignment-before-generation approach to generate 3D shapes based on 2D images or texts.
Our framework comprises two models: a Shape-Image-Text-Aligned Variational Auto-Encoder (SITA-VAE) and a conditional Aligned Shape Latent Diffusion Model (ASLDM)
arXiv Detail & Related papers (2023-06-29T17:17:57Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Beyond 3DMM: Learning to Capture High-fidelity 3D Face Shape [77.95154911528365]
3D Morphable Model (3DMM) fitting has widely benefited face analysis due to its strong 3D priori.
Previous reconstructed 3D faces suffer from degraded visual verisimilitude due to the loss of fine-grained geometry.
This paper proposes a complete solution to capture the personalized shape so that the reconstructed shape looks identical to the corresponding person.
arXiv Detail & Related papers (2022-04-09T03:46:18Z) - 3D-Aware Indoor Scene Synthesis with Depth Priors [62.82867334012399]
Existing methods fail to model indoor scenes due to the large diversity of room layouts and the objects inside.
We argue that indoor scenes do not have a shared intrinsic structure, and hence only using 2D images cannot adequately guide the model with the 3D geometry.
arXiv Detail & Related papers (2022-02-17T09:54:29Z) - Geometric Correspondence Fields: Learned Differentiable Rendering for 3D
Pose Refinement in the Wild [96.09941542587865]
We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild.
In this way, we precisely align 3D models to objects in RGB images which results in significantly improved 3D pose estimates.
We evaluate our approach on the challenging Pix3D dataset and achieve up to 55% relative improvement compared to state-of-the-art refinement methods in multiple metrics.
arXiv Detail & Related papers (2020-07-17T12:34:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.