The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning
via Pose-aware Convolution
- URL: http://arxiv.org/abs/2205.15210v1
- Date: Mon, 30 May 2022 16:11:55 GMT
- Title: The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning
via Pose-aware Convolution
- Authors: Ronghan Chen, Yang Cong
- Abstract summary: We develop a Pose-aware Rotation Invariant Convolution (i.e., PaRI-Conv)
We propose an Augmented Point Pair Feature (APPF) to fully encode the RI relative pose information, and a factorized dynamic kernel for pose-aware kernel generation.
Our PaRI-Conv surpasses the state-of-the-art RI methods while being more compact and efficient.
- Score: 18.595285633151715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rotation-invariant (RI) 3D deep learning methods suffer performance
degradation as they typically design RI representations as input that lose
critical global information comparing to 3D coordinates. Most state-of-the-arts
address it by incurring additional blocks or complex global representations in
a heavy and ineffective manner. In this paper, we reveal that the global
information loss stems from an unexplored pose information loss problem, which
can be solved more efficiently and effectively as we only need to restore more
lightweight local pose in each layer, and the global information can be
hierarchically aggregated in the deep networks without extra efforts. To
address this problem, we develop a Pose-aware Rotation Invariant Convolution
(i.e., PaRI-Conv), which dynamically adapts its kernels based on the relative
poses. To implement it, we propose an Augmented Point Pair Feature (APPF) to
fully encode the RI relative pose information, and a factorized dynamic kernel
for pose-aware kernel generation, which can further reduce the computational
cost and memory burden by decomposing the kernel into a shared basis matrix and
a pose-aware diagonal matrix. Extensive experiments on shape classification and
part segmentation tasks show that our PaRI-Conv surpasses the state-of-the-art
RI methods while being more compact and efficient.
Related papers
- LieRE: Generalizing Rotary Position Encodings [4.07373334379699]
Rotary Position (RoPE) has emerged as a popular choice in language models.
RoPE is constrained to one-dimensional sequence data.
LieRE replaces RoPE's block-2D rotation matrix with a learned, dense, high-dimensional rotation matrix of variable sparsity.
arXiv Detail & Related papers (2024-06-14T17:41:55Z) - IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images [50.4538089115248]
Generalizable 3D object reconstruction from single-view RGB-D images remains a challenging task.
We propose a novel approach, IPoD, which harmonizes implicit field learning with point diffusion.
Experiments conducted on the CO3D-v2 dataset affirm the superiority of IPoD, achieving 7.8% improvement in F-score and 28.6% in Chamfer distance over existing methods.
arXiv Detail & Related papers (2024-03-30T07:17:37Z) - Low-Resolution Self-Attention for Semantic Segmentation [93.30597515880079]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.
Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.
We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - Learning Continuous Depth Representation via Geometric Spatial
Aggregator [47.1698365486215]
We propose a novel continuous depth representation for depth map super-resolution (DSR)
The heart of this representation is our proposed Geometric Spatial Aggregator (GSA), which exploits a distance field modulated by arbitrarily upsampled target gridding.
We also present a transformer-style backbone named GeoDSR, which possesses a principled way to construct the functional mapping between local coordinates.
arXiv Detail & Related papers (2022-12-07T07:48:23Z) - Super-Resolution Based Patch-Free 3D Image Segmentation with
High-Frequency Guidance [20.86089285980103]
High resolution (HR) 3D images are widely used nowadays, such as medical images like Magnetic Resonance Imaging (MRI) and Computed Tomography (CT)
arXiv Detail & Related papers (2022-10-26T11:46:08Z) - Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image
Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising.
We propose rank-enhanced low-dimensional convolution set (Re-ConvSet)
We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z) - SUMD: Super U-shaped Matrix Decomposition Convolutional neural network
for Image denoising [0.0]
We introduce the matrix decomposition module(MD) in the network to establish the global context feature.
Inspired by the design of multi-stage progressive restoration of U-shaped architecture, we further integrate the MD module into the multi-branches.
Our model(SUMD) can produce comparable visual quality and accuracy results with Transformer-based methods.
arXiv Detail & Related papers (2022-04-11T04:38:34Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - Adjoint Rigid Transform Network: Task-conditioned Alignment of 3D Shapes [86.2129580231191]
Adjoint Rigid Transform (ART) Network is a neural module which can be integrated with a variety of 3D networks.
ART learns to rotate input shapes to a learned canonical orientation, which is crucial for a lot of tasks.
We will release our code and pre-trained models for further research.
arXiv Detail & Related papers (2021-02-01T20:58:45Z) - A Rotation-Invariant Framework for Deep Point Cloud Analysis [132.91915346157018]
We introduce a new low-level purely rotation-invariant representation to replace common 3D Cartesian coordinates as the network inputs.
Also, we present a network architecture to embed these representations into features, encoding local relations between points and their neighbors, and the global shape structure.
We evaluate our method on multiple point cloud analysis tasks, including shape classification, part segmentation, and shape retrieval.
arXiv Detail & Related papers (2020-03-16T14:04:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.