Sixth-Sense: Self-Supervised Learning of Spatial Awareness of Humans from a Planar Lidar
- URL: http://arxiv.org/abs/2502.21029v1
- Date: Fri, 28 Feb 2025 13:22:12 GMT
- Title: Sixth-Sense: Self-Supervised Learning of Spatial Awareness of Humans from a Planar Lidar
- Authors: Simone Arreghini, Nicholas Carlotti, Mirko Nava, Antonio Paolillo, Alessandro Giusti,
- Abstract summary: Most commercially available service robots are equipped with cameras with a narrow field of view, making them blind when a user is approaching from other directions.<n>We propose a self-supervised approach to detect humans and estimate their 2D pose from 1D LiDAR data, using detections from an RGB-D camera as a supervision source.<n>Our model is capable of detecting humans omnidirectionally from 1D LiDAR data in a novel environment, with 71% precision and 80% recall, while retaining an average absolute error of 13 cm in distance and 44deg in orientation.
- Score: 47.992786505913955
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Localizing humans is a key prerequisite for any service robot operating in proximity to people. In these scenarios, robots rely on a multitude of state-of-the-art detectors usually designed to operate with RGB-D cameras or expensive 3D LiDARs. However, most commercially available service robots are equipped with cameras with a narrow field of view, making them blind when a user is approaching from other directions, or inexpensive 1D LiDARs whose readings are difficult to interpret. To address these limitations, we propose a self-supervised approach to detect humans and estimate their 2D pose from 1D LiDAR data, using detections from an RGB-D camera as a supervision source. Our approach aims to provide service robots with spatial awareness of nearby humans. After training on 70 minutes of data autonomously collected in two environments, our model is capable of detecting humans omnidirectionally from 1D LiDAR data in a novel environment, with 71% precision and 80% recall, while retaining an average absolute error of 13 cm in distance and 44{\deg} in orientation.
Related papers
- Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset [52.22758311559]
We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot.
The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors.
The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users.
arXiv Detail & Related papers (2024-03-21T14:53:50Z) - Multi-model fusion for Aerial Vision and Dialog Navigation based on
human attention aids [69.98258892165767]
We present an aerial navigation task for the 2023 ICCV Conversation History.
We propose an effective method of fusion training of Human Attention Aided Transformer model (HAA-Transformer) and Human Attention Aided LSTM (HAA-LSTM) models.
arXiv Detail & Related papers (2023-08-27T10:32:52Z) - Monocular 2D Camera-based Proximity Monitoring for Human-Machine
Collision Warning on Construction Sites [1.7223564681760168]
Accident of struck-by machines is one of the leading causes of casualties on construction sites.
Monitoring workers' proximities to avoid human-machine collisions has aroused great concern in construction safety management.
This study proposes a novel framework for proximity monitoring using only an ordinary 2D camera to realize real-time human-machine collision warning.
arXiv Detail & Related papers (2023-05-29T07:47:27Z) - External Camera-based Mobile Robot Pose Estimation for Collaborative
Perception with Smart Edge Sensors [22.5939915003931]
We present an approach for estimating a mobile robot's pose w.r.t. the allocentric coordinates of a network of static cameras using multi-view RGB images.
The images are processed online, locally on smart edge sensors by deep neural networks to detect the robot.
With the robot's pose precisely estimated, its observations can be fused into the allocentric scene model.
arXiv Detail & Related papers (2023-03-07T11:03:33Z) - DensePose From WiFi [86.61881052177228]
We develop a deep neural network that maps the phase and amplitude of WiFi signals to UV coordinates within 24 human regions.
Our model can estimate the dense pose of multiple subjects, with comparable performance to image-based approaches.
arXiv Detail & Related papers (2022-12-31T16:48:43Z) - Domain and Modality Gaps for LiDAR-based Person Detection on Mobile
Robots [91.01747068273666]
This paper studies existing LiDAR-based person detectors with a particular focus on mobile robot scenarios.
Experiments revolve around the domain gap between driving and mobile robot scenarios, as well as the modality gap between 3D and 2D LiDAR sensors.
Results provide practical insights into LiDAR-based person detection and facilitate informed decisions for relevant mobile robot designs and applications.
arXiv Detail & Related papers (2021-06-21T16:35:49Z) - Few-Shot Visual Grounding for Natural Human-Robot Interaction [0.0]
We propose a software architecture that segments a target object from a crowded scene, indicated verbally by a human user.
At the core of our system, we employ a multi-modal deep neural network for visual grounding.
We evaluate the performance of the proposed model on real RGB-D data collected from public scene datasets.
arXiv Detail & Related papers (2021-03-17T15:24:02Z) - Perceiving Humans: from Monocular 3D Localization to Social Distancing [93.03056743850141]
We present a new cost-effective vision-based method that perceives humans' locations in 3D and their body orientation from a single image.
We show that it is possible to rethink the concept of "social distancing" as a form of social interaction in contrast to a simple location-based rule.
arXiv Detail & Related papers (2020-09-01T10:12:30Z) - Object-Independent Human-to-Robot Handovers using Real Time Robotic
Vision [6.089651609511804]
We present an approach for safe and object-independent human-to-robot handovers using real time robotic vision and manipulation.
In experiments with 13 objects, the robot was able to successfully take the object from the human in 81.9% of the trials.
arXiv Detail & Related papers (2020-06-02T17:29:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.