mmPose-NLP: A Natural Language Processing Approach to Precise Skeletal
Pose Estimation using mmWave Radars
- URL: http://arxiv.org/abs/2107.10327v1
- Date: Wed, 21 Jul 2021 19:45:17 GMT
- Title: mmPose-NLP: A Natural Language Processing Approach to Precise Skeletal
Pose Estimation using mmWave Radars
- Authors: Arindam Sengupta and Siyang Cao
- Abstract summary: This paper presents a novel Natural Language Processing (NLP) inspired Sequence-to-Sequence (Seq2Seq) skeletal key-point estimator using millimeter-wave (mmWave) radar data.
To the best of the author's knowledge, this is the first method to precisely estimate upto 25 skeletal key-points using mmWave radar data alone.
Skeletal pose estimation is critical in several applications ranging from autonomous vehicles, traffic monitoring, patient monitoring, gait analysis, to defense security forensics, and aid both preventative and actionable decision making.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we presented mmPose-NLP, a novel Natural Language Processing
(NLP) inspired Sequence-to-Sequence (Seq2Seq) skeletal key-point estimator
using millimeter-wave (mmWave) radar data. To the best of the author's
knowledge, this is the first method to precisely estimate upto 25 skeletal
key-points using mmWave radar data alone. Skeletal pose estimation is critical
in several applications ranging from autonomous vehicles, traffic monitoring,
patient monitoring, gait analysis, to defense security forensics, and aid both
preventative and actionable decision making. The use of mmWave radars for this
task, over traditionally employed optical sensors, provide several advantages,
primarily its operational robustness to scene lighting and adverse weather
conditions, where optical sensor performance degrade significantly. The mmWave
radar point-cloud (PCL) data is first voxelized (analogous to tokenization in
NLP) and $N$ frames of the voxelized radar data (analogous to a text paragraph
in NLP) is subjected to the proposed mmPose-NLP architecture, where the voxel
indices of the 25 skeletal key-points (analogous to keyword extraction in NLP)
are predicted. The voxel indices are converted back to real world 3-D
coordinates using the voxel dictionary used during the tokenization process.
Mean Absolute Error (MAE) metrics were used to measure the accuracy of the
proposed system against the ground truth, with the proposed mmPose-NLP offering
<3 cm localization errors in the depth, horizontal and vertical axes. The
effect of the number of input frames vs performance/accuracy was also studied
for N = {1,2,..,10}. A comprehensive methodology, results, discussions and
limitations are presented in this paper. All the source codes and results are
made available on GitHub for furthering research and development in this
critical yet emerging domain of skeletal key-point estimation using mmWave
radars.
Related papers
- Fried Parameter Estimation from Single Wavefront Sensor Image with Artificial Neural Networks [0.9883562565157392]
Atmospheric turbulence degrades the quality of astronomical observations in ground-based telescopes, leading to distorted and blurry images.
Adaptive Optics (AO) systems are designed to counteract these effects, using atmospheric measurements captured by a wavefront sensor to make real-time corrections to the incoming wavefront.
The Fried parameter, r0, characterises the strength of atmospheric turbulence and is an essential control parameter for optimising the performance of AO systems.
We develop a novel data-driven approach, adapting machine learning methods from computer vision for Fried parameter estimation from a single Shack-Hartmann or pyramid wavefront sensor image.
arXiv Detail & Related papers (2025-04-23T18:16:07Z) - TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion [54.46664104437454]
We propose TacoDepth, an efficient and accurate Radar-Camera depth estimation model with one-stage fusion.
Specifically, the graph-based Radar structure extractor and the pyramid-based Radar fusion module are designed.
Compared with the previous state-of-the-art approach, TacoDepth improves depth accuracy and processing speed by 12.8% and 91.8%.
arXiv Detail & Related papers (2025-04-16T05:25:04Z) - Skeleton Detection Using Dual Radars with Integration of Dual-View CNN Models and mmPose [0.0]
This research proposes three Dual ViewCNN models, combining PointNet and mmPose, employing two mmWave radars.
While the proposed model shows suboptimal results for random walking, it excels in the arm swing case.
arXiv Detail & Related papers (2024-11-28T16:40:58Z) - ProbRadarM3F: mmWave Radar based Human Skeletal Pose Estimation with Probability Map Guided Multi-Format Feature Fusion [14.83158440666821]
This paper introduces a probability map guided multi-format feature fusion model, ProbRadarM3F.
ProbRadarM3F fuses the traditional heatmap features and the positional features, then effectively achieves the estimation of 14 keypoints of the human body.
arXiv Detail & Related papers (2024-05-08T15:54:57Z) - InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds [91.77050739918037]
Novel view synthesis (NVS) from a sparse set of images has advanced significantly in 3D computer vision.
It relies on precise initial estimation of camera parameters using Structure-from-Motion (SfM)
In this study, we introduce a novel and efficient framework to enhance robust NVS from sparse-view images.
arXiv Detail & Related papers (2024-03-29T17:29:58Z) - A Benchmark Dataset for Tornado Detection and Prediction using
Full-Resolution Polarimetric Weather Radar Data [4.1241397159763835]
This study introduces a new benchmark dataset, TorNet, to support development of Machine Learning algorithms in tornado detection and prediction.
A novel deep learning (DL) architecture capable of processing raw radar imagery without the need for manual feature extraction is studied.
Despite not benefiting from manual feature engineering or other preprocessing, the DL model shows increased detection performance compared to non-DL and operational baselines.
arXiv Detail & Related papers (2024-01-26T21:47:39Z) - UnLoc: A Universal Localization Method for Autonomous Vehicles using
LiDAR, Radar and/or Camera Input [51.150605800173366]
UnLoc is a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions.
Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets.
arXiv Detail & Related papers (2023-07-03T04:10:55Z) - UncLe-SLAM: Uncertainty Learning for Dense Neural SLAM [60.575435353047304]
We present an uncertainty learning framework for dense neural simultaneous localization and mapping (SLAM)
We propose an online framework for sensor uncertainty estimation that can be trained in a self-supervised manner from only 2D input data.
arXiv Detail & Related papers (2023-06-19T16:26:25Z) - Point Cloud-based Proactive Link Quality Prediction for Millimeter-wave
Communications [2.559190942797394]
This study proposes a point cloud-based method for mmWave link quality prediction.
Our proposed method can predict future large attenuation of mmWave received signal strength and throughput.
arXiv Detail & Related papers (2023-01-02T16:51:40Z) - HuPR: A Benchmark for Human Pose Estimation Using Millimeter Wave Radar [30.51398364813315]
This paper introduces a novel human pose estimation benchmark, Human Pose with Millimeter Wave Radar (HuPR)
This dataset is created using cross-calibrated mmWave radar sensors and a monocular RGB camera for cross-modality training of radar-based human pose estimation.
arXiv Detail & Related papers (2022-10-22T22:28:40Z) - RVMDE: Radar Validated Monocular Depth Estimation for Robotics [5.360594929347198]
An innate rigid calibration of binocular vision sensors is crucial for accurate depth estimation.
Alternatively, a monocular camera alleviates the limitation at the expense of accuracy in estimating depth, and the challenge exacerbates in harsh environmental conditions.
This work explores the utility of coarse signals from radar when fused with fine-grained data from a monocular camera for depth estimation in harsh environmental conditions.
arXiv Detail & Related papers (2021-09-11T12:02:29Z) - Uncertainty-Aware Camera Pose Estimation from Points and Lines [101.03675842534415]
Perspective-n-Point-and-Line (Pn$PL) aims at fast, accurate and robust camera localizations with respect to a 3D model from 2D-3D feature coordinates.
arXiv Detail & Related papers (2021-07-08T15:19:36Z) - End-to-End Object Detection with Fully Convolutional Network [71.56728221604158]
We introduce a Prediction-aware One-To-One (POTO) label assignment for classification to enable end-to-end detection.
A simple 3D Max Filtering (3DMF) is proposed to utilize the multi-scale features and improve the discriminability of convolutions in the local region.
Our end-to-end framework achieves competitive performance against many state-of-the-art detectors with NMS on COCO and CrowdHuman datasets.
arXiv Detail & Related papers (2020-12-07T09:14:55Z) - InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic
Information Modeling [65.47126868838836]
We propose a novel 3D object detection framework with dynamic information modeling.
Coarse predictions are generated in the first stage via a voxel-based region proposal network.
Experiments are conducted on the large-scale nuScenes 3D detection benchmark.
arXiv Detail & Related papers (2020-07-16T18:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.