Hierarchical Point Cloud Encoding and Decoding with Lightweight
Self-Attention based Model
- URL: http://arxiv.org/abs/2202.06407v1
- Date: Sun, 13 Feb 2022 21:10:06 GMT
- Title: Hierarchical Point Cloud Encoding and Decoding with Lightweight
Self-Attention based Model
- Authors: En Yen Puang, Hao Zhang, Hongyuan Zhu, Wei Jing
- Abstract summary: SA-CNN is a self-attention based encoding and decoding architecture for representation learning of point cloud data.
We demonstrate that SA-CNN is capable of a wide range of applications, namely classification, part segmentation, reconstruction, shape retrieval, and unsupervised classification.
- Score: 22.338247335791095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we present SA-CNN, a hierarchical and lightweight
self-attention based encoding and decoding architecture for representation
learning of point cloud data. The proposed SA-CNN introduces convolution and
transposed convolution stacks to capture and generate contextual information
among unordered 3D points. Following conventional hierarchical pipeline, the
encoding process extracts feature in local-to-global manner, while the decoding
process generates feature and point cloud in coarse-to-fine, multi-resolution
stages. We demonstrate that SA-CNN is capable of a wide range of applications,
namely classification, part segmentation, reconstruction, shape retrieval, and
unsupervised classification. While achieving the state-of-the-art or comparable
performance in the benchmarks, SA-CNN maintains its model complexity several
order of magnitude lower than the others. In term of qualitative results, we
visualize the multi-stage point cloud reconstructions and latent walks on rigid
objects as well as deformable non-rigid human and robot models.
Related papers
- Hierarchical Attention Networks for Lossless Point Cloud Attribute Compression [22.234604407822673]
We propose a deep hierarchical attention context model for attribute compression of point clouds.
A simple and effective Level of Detail (LoD) structure is introduced to yield a coarse-to-fine representation.
Points within the same refinement level are encoded in parallel, sharing a common context point group.
arXiv Detail & Related papers (2025-04-01T07:14:10Z) - SENetV2: Aggregated dense layer for channelwise and global
representations [0.0]
We introduce a novel aggregated multilayer perceptron, a multi-branch dense layer, within the Squeeze residual module.
This fusion enhances the network's ability to capture channel-wise patterns and have global knowledge.
We conduct extensive experiments on benchmark datasets to validate the model and compare them with established architectures.
arXiv Detail & Related papers (2023-11-17T14:10:57Z) - Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.
Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.
We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - Vector Quantized Wasserstein Auto-Encoder [57.29764749855623]
We study learning deep discrete representations from the generative viewpoint.
We endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution.
We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution.
arXiv Detail & Related papers (2023-02-12T13:51:36Z) - Multi-scale and Cross-scale Contrastive Learning for Semantic
Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks.
By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z) - Optimising for Interpretability: Convolutional Dynamic Alignment
Networks [108.83345790813445]
We introduce a new family of neural network models called Convolutional Dynamic Alignment Networks (CoDA Nets)
Their core building blocks are Dynamic Alignment Units (DAUs), which are optimised to transform their inputs with dynamically computed weight vectors that align with task-relevant patterns.
CoDA Nets model the classification prediction through a series of input-dependent linear transformations, allowing for linear decomposition of the output into individual input contributions.
arXiv Detail & Related papers (2021-09-27T12:39:46Z) - Latent Code-Based Fusion: A Volterra Neural Network Approach [21.25021807184103]
We propose a deep structure encoder using the recently introduced Volterra Neural Networks (VNNs)
We show that the proposed approach demonstrates a much-improved sample complexity over CNN-based auto-encoder with a superb robust classification performance.
arXiv Detail & Related papers (2021-04-10T18:29:01Z) - Spatial Dependency Networks: Neural Layers for Improved Generative Image
Modeling [79.15521784128102]
We introduce a novel neural network for building image generators (decoders) and apply it to variational autoencoders (VAEs)
In our spatial dependency networks (SDNs), feature maps at each level of a deep neural net are computed in a spatially coherent way.
We show that augmenting the decoder of a hierarchical VAE by spatial dependency layers considerably improves density estimation.
arXiv Detail & Related papers (2021-03-16T07:01:08Z) - A Holistically-Guided Decoder for Deep Representation Learning with
Applications to Semantic Segmentation and Object Detection [74.88284082187462]
One common strategy is to adopt dilated convolutions in the backbone networks to extract high-resolution feature maps.
We propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps.
arXiv Detail & Related papers (2020-12-18T10:51:49Z) - Learning Deep Interleaved Networks with Asymmetric Co-Attention for
Image Restoration [65.11022516031463]
We present a deep interleaved network (DIN) that learns how information at different states should be combined for high-quality (HQ) images reconstruction.
In this paper, we propose asymmetric co-attention (AsyCA) which is attached at each interleaved node to model the feature dependencies.
Our presented DIN can be trained end-to-end and applied to various image restoration tasks.
arXiv Detail & Related papers (2020-10-29T15:32:00Z) - Sparse Coding Driven Deep Decision Tree Ensembles for Nuclear
Segmentation in Digital Pathology Images [15.236873250912062]
We propose an easily trained yet powerful representation learning approach with performance highly competitive to deep neural networks in a digital pathology image segmentation task.
The method, called sparse coding driven deep decision tree ensembles that we abbreviate as ScD2TE, provides a new perspective on representation learning.
arXiv Detail & Related papers (2020-08-13T02:59:31Z) - Structural Deep Clustering Network [45.370272344031285]
We propose a Structural Deep Clustering Network (SDCN) to integrate the structural information into deep clustering.
Specifically, we design a delivery operator to transfer the representations learned by autoencoder to the corresponding GCN layer.
In this way, the multiple structures of data, from low-order to high-order, are naturally combined with the multiple representations learned by autoencoder.
arXiv Detail & Related papers (2020-02-05T04:33:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.