HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for
Autonomous Driving
- URL: http://arxiv.org/abs/2212.07729v1
- Date: Thu, 15 Dec 2022 11:15:14 GMT
- Title: HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for
Autonomous Driving
- Authors: Andrei Zanfir, Mihai Zanfir, Alexander Gorban, Jingwei Ji, Yin Zhou,
Dragomir Anguelov and Cristian Sminchisescu
- Abstract summary: 3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians.
Our method efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin.
Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages.
- Score: 95.42203932627102
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous driving is an exciting new industry, posing important research
questions. Within the perception module, 3D human pose estimation is an
emerging technology, which can enable the autonomous vehicle to perceive and
understand the subtle and complex behaviors of pedestrians. While hardware
systems and sensors have dramatically improved over the decades -- with cars
potentially boasting complex LiDAR and vision systems and with a growing
expansion of the available body of dedicated datasets for this newly available
information -- not much work has been done to harness these novel signals for
the core problem of 3D human pose estimation. Our method, which we coin HUM3DIL
(HUMan 3D from Images and LiDAR), efficiently makes use of these complementary
signals, in a semi-supervised fashion and outperforms existing methods with a
large margin. It is a fast and compact model for onboard deployment.
Specifically, we embed LiDAR points into pixel-aligned multi-modal features,
which we pass through a sequence of Transformer refinement stages. Quantitative
experiments on the Waymo Open Dataset support these claims, where we achieve
state-of-the-art results on the task of 3D pose estimation.
Related papers
- HeightFormer: A Semantic Alignment Monocular 3D Object Detection Method from Roadside Perspective [11.841338298700421]
We propose a novel 3D object detection framework integrating Spatial Former and Voxel Pooling Former to enhance 2D-to-3D projection based on height estimation.
Experiments were conducted using the Rope3D and DAIR-V2X-I dataset, and the results demonstrated the outperformance of the proposed algorithm in the detection of both vehicles and cyclists.
arXiv Detail & Related papers (2024-10-10T09:37:33Z) - Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects.
We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z) - UniTR: A Unified and Efficient Multi-Modal Transformer for
Bird's-Eye-View Representation [113.35352122662752]
We present an efficient multi-modal backbone for outdoor 3D perception named UniTR.
UniTR processes a variety of modalities with unified modeling and shared parameters.
UniTR is also a fundamentally task-agnostic backbone that naturally supports different 3D perception tasks.
arXiv Detail & Related papers (2023-08-15T12:13:44Z) - Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for
Autonomous Driving [0.5735035463793008]
3D human pose estimation is crucial for enabling autonomous vehicles (AVs) to make informed decisions and respond proactively in critical road scenarios.
We present a simple yet efficient weakly supervised approach for 3D HPE in the AV context by employing a high-level sensor fusion between camera and LiDAR data.
Our approach outperforms state-of-the-art results by up to $sim$ 13% on the Open dataset in the weakly supervised setting.
arXiv Detail & Related papers (2023-07-27T14:28:50Z) - View-to-Label: Multi-View Consistency for Self-Supervised 3D Object
Detection [46.077668660248534]
We propose a novel approach to self-supervise 3D object detection purely from RGB sequences alone.
Our experiments on KITTI 3D dataset demonstrate performance on par with state-of-the-art self-supervised methods.
arXiv Detail & Related papers (2023-05-29T09:30:39Z) - GINA-3D: Learning to Generate Implicit Neural Assets in the Wild [38.51391650845503]
GINA-3D is a generative model that uses real-world driving data from camera and LiDAR sensors to create 3D implicit neural assets of diverse vehicles and pedestrians.
We construct a large-scale object-centric dataset containing over 1.2M images of vehicles and pedestrians.
We demonstrate that it achieves state-of-the-art performance in quality and diversity for both generated images and geometries.
arXiv Detail & Related papers (2023-04-04T23:41:20Z) - PillarGrid: Deep Learning-based Cooperative Perception for 3D Object
Detection from Onboard-Roadside LiDAR [15.195933965761645]
We propose textitPillarGrid, a novel cooperative perception method fusing information from multiple 3D LiDARs.
PillarGrid consists of four main phases: 1) cooperative preprocessing of point clouds, 2) pillar-wise voxelization and feature extraction, 3) grid-wise deep fusion of features from multiple sensors, and 4) convolutional neural network (CNN)-based augmented 3D object detection.
Extensive experimentation shows that PillarGrid outperforms the SOTA single-LiDAR-based 3D object detection methods with respect to both accuracy and range by a large margin.
arXiv Detail & Related papers (2022-03-12T02:28:41Z) - One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario.
The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available.
We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z) - siaNMS: Non-Maximum Suppression with Siamese Networks for Multi-Camera
3D Object Detection [65.03384167873564]
A siamese network is integrated into the pipeline of a well-known 3D object detector approach.
associations are exploited to enhance the 3D box regression of the object.
The experimental evaluation on the nuScenes dataset shows that the proposed method outperforms traditional NMS approaches.
arXiv Detail & Related papers (2020-02-19T15:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.