Deformably-Scaled Transposed Convolution
- URL: http://arxiv.org/abs/2210.09446v1
- Date: Mon, 17 Oct 2022 21:35:29 GMT
- Title: Deformably-Scaled Transposed Convolution
- Authors: Stefano B. Blumberg, Daniele Rav\'i, Mou-Cheng Xu, Matteo Figini,
Iasonas Kokkinos, Daniel C. Alexander
- Abstract summary: We revisit transposed convolution and introduce a novel layer that allows us to place information in the image selectively.
Our novel layer can be used as a drop-in replacement for 2D and 3D upsampling operators and the code will be publicly available.
- Score: 17.4596321623511
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Transposed convolution is crucial for generating high-resolution outputs, yet
has received little attention compared to convolution layers. In this work we
revisit transposed convolution and introduce a novel layer that allows us to
place information in the image selectively and choose the `stroke breadth' at
which the image is synthesized, whilst incurring a small additional parameter
cost. For this we introduce three ideas: firstly, we regress offsets to the
positions where the transpose convolution results are placed; secondly we
broadcast the offset weight locations over a learnable neighborhood; and
thirdly we use a compact parametrization to share weights and restrict offsets.
We show that simply substituting upsampling operators with our novel layer
produces substantial improvements across tasks as diverse as instance
segmentation, object detection, semantic segmentation, generative image
modeling, and 3D magnetic resonance image enhancement, while outperforming all
existing variants of transposed convolutions. Our novel layer can be used as a
drop-in replacement for 2D and 3D upsampling operators and the code will be
publicly available.
Related papers
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections [8.261637198675151]
Novel View Synthesis (NVS) from unconstrained photo collections is challenging in computer graphics.
We propose an efficient point-based differentiable rendering framework for scene reconstruction from photo collections.
Our approach outperforms existing approaches on the rendering quality of novel view and appearance synthesis with high converge and rendering speed.
arXiv Detail & Related papers (2024-06-04T15:17:37Z) - CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians [18.42203035154126]
We introduce a structured Gaussian representation that can be controlled in 2D image space.
We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization.
We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes.
arXiv Detail & Related papers (2024-03-28T15:27:13Z) - CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs [65.80187860906115]
We propose a novel approach to improve NeRF's performance with sparse inputs.
We first adopt a voxel-based ray sampling strategy to ensure that the sampled rays intersect with a certain voxel in 3D space.
We then randomly sample additional points within the voxel and apply a Transformer to infer the properties of other points on each ray, which are then incorporated into the volume rendering.
arXiv Detail & Related papers (2024-03-25T15:56:17Z) - Meta-Auxiliary Network for 3D GAN Inversion [18.777352198191004]
In this work, we present a novel meta-auxiliary framework, while leveraging the newly developed 3D GANs as generator.
In the first stage, we invert the input image to an editable latent code using off-the-shelf inversion techniques.
The auxiliary network is proposed to refine the generator parameters with the given image as input, which both predicts offsets for weights of convolutional layers and sampling positions of volume rendering.
In the second stage, we perform meta-learning to fast adapt the auxiliary network to the input image, then the final reconstructed image is synthesized via the meta-learned auxiliary network.
arXiv Detail & Related papers (2023-05-18T11:26:27Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Learning Local Displacements for Point Cloud Completion [93.54286830844134]
We propose a novel approach aimed at object and semantic scene completion from a partial scan represented as a 3D point cloud.
Our architecture relies on three novel layers that are used successively within an encoder-decoder structure.
We evaluate both architectures on object and indoor scene completion tasks, achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-03-30T18:31:37Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z) - AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation [19.53151547706724]
transformer-based models have drawn attention to exploring these techniques in medical image segmentation.
We propose Axial Fusion Transformer UNet (AFTer-UNet), which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling.
It has fewer parameters and takes less GPU memory to train than the previous transformer-based models.
arXiv Detail & Related papers (2021-10-20T06:47:28Z) - Group Shift Pointwise Convolution for Volumetric Medical Image
Segmentation [31.72090839643412]
We introduce a novel Group Shift Pointwise Convolution (GSP-Conv) to improve the effectiveness and efficiency of 3D convolutions.
GSP-Conv simplifies 3D convolutions into pointwise ones with 1x1x1 kernels, which dramatically reduces the number of model parameters and FLOPs.
Results show that our method, with substantially decreased model complexity, achieves comparable or even better performance than models employing 3D convolutions.
arXiv Detail & Related papers (2021-09-26T15:27:33Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.