Learning Geometry-Disentangled Representation for Complementary
Understanding of 3D Object Point Cloud
- URL: http://arxiv.org/abs/2012.10921v3
- Date: Sun, 7 Feb 2021 06:45:10 GMT
- Title: Learning Geometry-Disentangled Representation for Complementary
Understanding of 3D Object Point Cloud
- Authors: Mutian Xu, Junhao Zhang, Zhipeng Zhou, Mingye Xu, Xiaojuan Qi, Yu Qiao
- Abstract summary: We propose Geometry-Disentangled Attention Network (GDANet) for 3D image processing.
GDANet disentangles point clouds into contour and flat part of 3D objects, respectively denoted by sharp and gentle variation components.
Experiments on 3D object classification and segmentation benchmarks demonstrate that GDANet achieves the state-of-the-arts with fewer parameters.
- Score: 50.56461318879761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In 2D image processing, some attempts decompose images into high and low
frequency components for describing edge and smooth parts respectively.
Similarly, the contour and flat area of 3D objects, such as the boundary and
seat area of a chair, describe different but also complementary geometries.
However, such investigation is lost in previous deep networks that understand
point clouds by directly treating all points or local patches equally. To solve
this problem, we propose Geometry-Disentangled Attention Network (GDANet).
GDANet introduces Geometry-Disentangle Module to dynamically disentangle point
clouds into the contour and flat part of 3D objects, respectively denoted by
sharp and gentle variation components. Then GDANet exploits Sharp-Gentle
Complementary Attention Module that regards the features from sharp and gentle
variation components as two holistic representations, and pays different
attentions to them while fusing them respectively with original point cloud
features. In this way, our method captures and refines the holistic and
complementary 3D geometric semantics from two distinct disentangled components
to supplement the local information. Extensive experiments on 3D object
classification and segmentation benchmarks demonstrate that GDANet achieves the
state-of-the-arts with fewer parameters. Code is released on
https://github.com/mutianxu/GDANet.
Related papers
- Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding [11.416392706435415]
Zero-shot 3D point cloud understanding can be achieved via 2D Vision-Language Models (VLMs)
Existing strategies directly map Vision-Language Models from 2D pixels of rendered or captured views to 3D points, overlooking the inherent and expressible point cloud geometric structure.
We introduce the first training-free aggregation technique that leverages the point cloud's 3D geometric structure to improve the quality of the transferred Vision-Language Models.
arXiv Detail & Related papers (2023-12-04T12:30:07Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - Lightweight integration of 3D features to improve 2D image segmentation [1.3799488979862027]
We show that image segmentation can benefit from 3D geometric information without requiring a 3D groundtruth.
Our method can be applied to many 2D segmentation networks, improving significantly their performance.
arXiv Detail & Related papers (2022-12-16T08:22:55Z) - PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal
Distillation for 3D Shape Recognition [55.38462937452363]
We propose a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student.
By pair-wise aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification.
arXiv Detail & Related papers (2022-07-07T07:23:20Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - Joint Deep Multi-Graph Matching and 3D Geometry Learning from
Inhomogeneous 2D Image Collections [57.60094385551773]
We propose a trainable framework for learning a deformable 3D geometry model from inhomogeneous image collections.
We in addition obtain the underlying 3D geometry of the objects depicted in the 2D images.
arXiv Detail & Related papers (2021-03-31T17:25:36Z) - PanoNet3D: Combining Semantic and Geometric Understanding for LiDARPoint
Cloud Detection [40.907188672454986]
We propose to learn both semantic feature and geometric structure via a unified multi-view framework.
By fusing semantic and geometric features, our method outperforms state-of-the-art approaches in all categories by a large margin.
arXiv Detail & Related papers (2020-12-17T06:58:34Z) - ParaNet: Deep Regular Representation for 3D Point Clouds [62.81379889095186]
ParaNet is a novel end-to-end deep learning framework for representing 3D point clouds.
It converts an irregular 3D point cloud into a regular 2D color image, named point geometry image (PGI)
In contrast to conventional regular representation modalities based on multi-view projection and voxelization, the proposed representation is differentiable and reversible.
arXiv Detail & Related papers (2020-12-05T13:19:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.