Related papers: VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data

VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data

URL: http://arxiv.org/abs/2312.08871v1
Date: Mon, 11 Dec 2023 23:50:14 GMT
Title: VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data
Authors: Jian Shi, Peter Wonka
Abstract summary: textitVoxelKP is a novel fully sparse network architecture tailored for human keypoint estimation in LiDAR data. We introduce sparse box-attention to focus on learning spatial correlations between keypoints within each human instance. We incorporate a spatial encoding to leverage absolute 3D coordinates when projecting 3D voxels to a 2D grid encoding a bird's eye view.
Score: 53.638818890966036
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present \textit{VoxelKP}, a novel fully sparse network architecture tailored for human keypoint estimation in LiDAR data. The key challenge is that objects are distributed sparsely in 3D space, while human keypoint detection requires detailed local information wherever humans are present. We propose four novel ideas in this paper. First, we propose sparse selective kernels to capture multi-scale context. Second, we introduce sparse box-attention to focus on learning spatial correlations between keypoints within each human instance. Third, we incorporate a spatial encoding to leverage absolute 3D coordinates when projecting 3D voxels to a 2D grid encoding a bird's eye view. Finally, we propose hybrid feature learning to combine the processing of per-voxel features with sparse convolution. We evaluate our method on the Waymo dataset and achieve an improvement of $27\%$ on the MPJPE metric compared to the state-of-the-art, \textit{HUM3DIL}, trained on the same data, and $12\%$ against the state-of-the-art, \textit{GC-KPL}, pretrained on a $25\times$ larger dataset. To the best of our knowledge, \textit{VoxelKP} is the first single-staged, fully sparse network that is specifically designed for addressing the challenging task of 3D keypoint estimation from LiDAR data, achieving state-of-the-art performances. Our code is available at \url{https://github.com/shijianjian/VoxelKP}.

Related papers

NeuraLoc: Visual Localization in Neural Implicit Map with Dual Complementary Features [50.212836834889146]
We propose an efficient and novel visual localization approach based on the neural implicit map with complementary features. Specifically, to enforce geometric constraints and reduce storage requirements, we implicitly learn a 3D keypoint descriptor field. To further address the semantic ambiguity of descriptors, we introduce additional semantic contextual feature fields.
arXiv Detail & Related papers (2025-03-08T08:04:27Z)
Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression [1.2974519529978974]
This paper introduces a pipeline for keypoint descriptor synthesis using Neural Radiance Field (NeRF) generating novel poses and feeding them into a trained NeRF model to create new views, our approach enhances the KSCR's capabilities in data-scarce environments. The proposed system could significantly improve localization accuracy by up to 50% and cost only a fraction of time for data synthesis.
arXiv Detail & Related papers (2024-03-15T13:40:37Z)
CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition [45.16530801796705]
CrossLoc3D is a novel 3D place recognition method that solves a large-scale point matching problem in a cross-source setting. We present CS-Campus3D, the first 3D aerial-ground cross-source dataset consisting of point cloud data from both aerial and ground LiDAR scans.
arXiv Detail & Related papers (2023-03-31T02:50:52Z)
VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking [78.25819070166351]
We propose VoxelNext for fully sparse 3D object detection. Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies. Our strong sparse convolutional network VoxelNeXt detects and tracks 3D objects through voxel features entirely.
arXiv Detail & Related papers (2023-03-20T17:40:44Z)
CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D. Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels. To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z)
Graph R-CNN: Towards Accurate 3D Object Detection with Semantic-Decorated Local Graph [26.226885108862735]
Two-stage detectors have gained much popularity in 3D object detection. Most two-stage 3D detectors utilize grid points, voxel grids, or sampled keypoints for RoI feature extraction in the second stage. This paper solves this problem in three aspects.
arXiv Detail & Related papers (2022-08-07T02:56:56Z)
Focal Sparse Convolutional Networks for 3D Object Detection [121.45950754511021]
We introduce two new modules to enhance the capability of Sparse CNNs. They are focal sparse convolution (Focals Conv) and its multi-modal variant of focal sparse convolution with fusion. For the first time, we show that spatially learnable sparsity in sparse convolution is essential for sophisticated 3D object detection.
arXiv Detail & Related papers (2022-04-26T17:34:10Z)
From Voxel to Point: IoU-guided 3D Object Detection for Point Cloud with Voxel-to-Point Decoder [79.39041453836793]
We present an Intersection-over-Union (IoU) guided two-stage 3D object detector with a voxel-to-point decoder. We propose a residual voxel-to-point decoder to extract the point features in addition to the map-view features from the voxel based Region Proposal Network (RPN) We propose a simple and efficient method to align the estimated IoUs to the refined proposal boxes as a more relevant localization confidence.
arXiv Detail & Related papers (2021-08-08T14:30:13Z)
Learning a Compact State Representation for Navigation Tasks by Autoencoding 2D-Lidar Scans [7.99536002595393]
We generate a compact representation of 2D-lidar scans for reinforcement learning in navigation tasks. In particular, we incorporate the relation of consecutive scans, especially ego-motion, by applying a memory model. Experiments show the capability of our approach to highly compress lidar data, maintain a meaningful distribution of the latent space, and even incorporate time-depended information.
arXiv Detail & Related papers (2021-02-03T16:10:26Z)
Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection [99.16162624992424]
We devise a simple but effective voxel-based framework, named Voxel R-CNN. By taking full advantage of voxel features in a two stage approach, our method achieves comparable detection accuracy with state-of-the-art point-based models. Our results show that Voxel R-CNN delivers a higher detection accuracy while maintaining a realtime frame processing rate, emphi.e, at a speed of 25 FPS on an NVIDIA 2080 Ti GPU.
arXiv Detail & Related papers (2020-12-31T17:02:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.