DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion
- URL: http://arxiv.org/abs/2111.10332v1
- Date: Fri, 19 Nov 2021 17:25:54 GMT
- Title: DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion
- Authors: Renrui Zhang, Ziyao Zeng, Ziyu Guo, Xinben Gao, Kexue Fu, Jianbo Shi
- Abstract summary: We propose Dual-Scale Point Cloud Recognition with High-frequency Fusion (DSPoint)
We reverse the conventional design of applying convolution on voxels and attention to points.
Experiments and ablations on widely-adopted ModelNet40, ShapeNet, and S3DIS demonstrate the state-of-the-art performance of our DSPoint.
- Score: 17.797795508707864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point cloud processing is a challenging task due to its sparsity and
irregularity. Prior works introduce delicate designs on either local feature
aggregator or global geometric architecture, but few combine both advantages.
We propose Dual-Scale Point Cloud Recognition with High-frequency Fusion
(DSPoint) to extract local-global features by concurrently operating on voxels
and points. We reverse the conventional design of applying convolution on
voxels and attention to points. Specifically, we disentangle point features
through channel dimension for dual-scale processing: one by point-wise
convolution for fine-grained geometry parsing, the other by voxel-wise global
attention for long-range structural exploration. We design a co-attention
fusion module for feature alignment to blend local-global modalities, which
conducts inter-scale cross-modality interaction by communicating high-frequency
coordinates information. Experiments and ablations on widely-adopted
ModelNet40, ShapeNet, and S3DIS demonstrate the state-of-the-art performance of
our DSPoint.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Twin Deformable Point Convolutions for Point Cloud Semantic Segmentation in Remote Sensing Scenes [12.506628755166814]
We propose novel convolution operators, termed Twin Deformable point Convolutions (TDConvs)
These operators aim to achieve adaptive feature learning by learning deformable sampling points in the latitude-longitude plane and altitude direction.
Experiments on existing popular benchmarks conclude that our TDConvs achieve the best segmentation performance.
arXiv Detail & Related papers (2024-05-30T06:31:03Z) - Point Cloud Compression with Implicit Neural Representations: A Unified Framework [54.119415852585306]
We present a pioneering point cloud compression framework capable of handling both geometry and attribute components.
Our framework utilizes two coordinate-based neural networks to implicitly represent a voxelized point cloud.
Our method exhibits high universality when contrasted with existing learning-based techniques.
arXiv Detail & Related papers (2024-05-19T09:19:40Z) - Mesh Denoising Transformer [104.5404564075393]
Mesh denoising is aimed at removing noise from input meshes while preserving their feature structures.
SurfaceFormer is a pioneering Transformer-based mesh denoising framework.
New representation known as Local Surface Descriptor captures local geometric intricacies.
Denoising Transformer module receives the multimodal information and achieves efficient global feature aggregation.
arXiv Detail & Related papers (2024-05-10T15:27:43Z) - Variational Relational Point Completion Network for Robust 3D
Classification [59.80993960827833]
Vari point cloud completion methods tend to generate global shape skeletons hence lack fine local details.
This paper proposes a variational framework, point Completion Network (VRCNet) with two appealing properties.
VRCNet shows great generalizability and robustness on real-world point cloud scans.
arXiv Detail & Related papers (2023-04-18T17:03:20Z) - APPT : Asymmetric Parallel Point Transformer for 3D Point Cloud
Understanding [20.87092793669536]
Transformer-based networks have achieved impressive performance in 3D point cloud understanding.
To tackle these problems, we propose Asymmetric Parallel Point Transformer (APPT)
APPT is able to capture features globally throughout the entire network while focusing on local-detailed features.
arXiv Detail & Related papers (2023-03-31T06:11:02Z) - LATFormer: Locality-Aware Point-View Fusion Transformer for 3D Shape
Recognition [38.540048855119004]
We propose a novel Locality-Aware Point-View Fusion Transformer (LATFormer) for 3D shape retrieval and classification.
The core component of LATFormer is a module named Locality-Aware Fusion (LAF) which integrates the local features of correlated regions across the two modalities.
In our LATFormer, we utilize the LAF module to fuse the multi-scale features of the two modalities both bidirectionally and hierarchically to obtain more informative features.
arXiv Detail & Related papers (2021-09-03T03:23:27Z) - Volumetric Propagation Network: Stereo-LiDAR Fusion for Long-Range Depth
Estimation [81.08111209632501]
We propose a geometry-aware stereo-LiDAR fusion network for long-range depth estimation.
We exploit sparse and accurate point clouds as a cue for guiding correspondences of stereo images in a unified 3D volume space.
Our network achieves state-of-the-art performance on the KITTI and the Virtual- KITTI datasets.
arXiv Detail & Related papers (2021-03-24T03:24:46Z) - 3D Object Detection with Pointformer [29.935891419574602]
We propose Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively.
A Local Transformer module is employed to model interactions among points in a local region, which learns context-dependent region features at an object level.
A Global Transformer is designed to learn context-aware representations at the scene level.
arXiv Detail & Related papers (2020-12-21T15:12:54Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.