When We First Met: Visual-Inertial Person Localization for Co-Robot
Rendezvous
- URL: http://arxiv.org/abs/2006.09959v2
- Date: Tue, 3 Nov 2020 13:57:23 GMT
- Title: When We First Met: Visual-Inertial Person Localization for Co-Robot
Rendezvous
- Authors: Xi Sun, Xinshuo Weng and Kris Kitani
- Abstract summary: We propose a method to learn a visual-inertial feature space in which the motion of a person in video can be easily matched to the motion measured by a wearable inertial measurement unit (IMU)
Our proposed method is able to accurately localize a target person with 80.7% accuracy using only 5 seconds of IMU data and video.
- Score: 29.922954461039698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We aim to enable robots to visually localize a target person through the aid
of an additional sensing modality -- the target person's 3D inertial
measurements. The need for such technology may arise when a robot is to meet
person in a crowd for the first time or when an autonomous vehicle must
rendezvous with a rider amongst a crowd without knowing the appearance of the
person in advance. A person's inertial information can be measured with a
wearable device such as a smart-phone and can be shared selectively with an
autonomous system during the rendezvous. We propose a method to learn a
visual-inertial feature space in which the motion of a person in video can be
easily matched to the motion measured by a wearable inertial measurement unit
(IMU). The transformation of the two modalities into the joint feature space is
learned through the use of a contrastive loss which forces inertial motion
features and video motion features generated by the same person to lie close in
the joint feature space. To validate our approach, we compose a dataset of over
60,000 video segments of moving people along with wearable IMU data. Our
experiments show that our proposed method is able to accurately localize a
target person with 80.7% accuracy using only 5 seconds of IMU data and video.
Related papers
- Wearable Sensor-Based Few-Shot Continual Learning on Hand Gestures for Motor-Impaired Individuals via Latent Embedding Exploitation [6.782362178252351]
We introduce the Latent Embedding Exploitation (LEE) mechanism in our replay-based Few-Shot Continual Learning framework.
Our method produces a diversified latent feature space by leveraging a preserved latent embedding known as gesture prior knowledge.
Our method helps motor-impaired persons leverage wearable devices, and their unique styles of movement can be learned and applied.
arXiv Detail & Related papers (2024-05-14T21:20:27Z) - EgoNav: Egocentric Scene-aware Human Trajectory Prediction [15.346096596482857]
Wearable collaborative robots stand to assist human wearers who need fall prevention assistance or wear exoskeletons.
Such a robot needs to be able to constantly adapt to the surrounding scene based on egocentric vision, and predict the ego motion of the wearer.
In this work, we leveraged body-mounted cameras and sensors to anticipate the trajectory of human wearers through complex surroundings.
arXiv Detail & Related papers (2024-03-27T21:43:12Z) - Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - QuestSim: Human Motion Tracking from Sparse Sensors with Simulated
Avatars [80.05743236282564]
Real-time tracking of human body motion is crucial for immersive experiences in AR/VR.
We present a reinforcement learning framework that takes in sparse signals from an HMD and two controllers.
We show that a single policy can be robust to diverse locomotion styles, different body sizes, and novel environments.
arXiv Detail & Related papers (2022-09-20T00:25:54Z) - Estimating 3D Motion and Forces of Human-Object Interactions from
Internet Videos [49.52070710518688]
We introduce a method to reconstruct the 3D motion of a person interacting with an object from a single RGB video.
Our method estimates the 3D poses of the person together with the object pose, the contact positions and the contact forces on the human body.
arXiv Detail & Related papers (2021-11-02T13:40:18Z) - Learning to Control Complex Robots Using High-Dimensional Interfaces:
Preliminary Insights [22.719193009150867]
We explore the use of limited upper-body motions, captured via motion sensors, as inputs to control a 7 degree-of-freedom robotic arm.
It is possible that even dense sensor signals lack the salient information and independence necessary for reliable high-dimensional robot control.
arXiv Detail & Related papers (2021-10-09T23:38:22Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Self-Supervised Motion Retargeting with Safety Guarantee [12.325683599398564]
We present a data-driven motion method that enables the generation of natural motions in humanoid robots from motion capture data or RGB videos.
Our method can generate expressive robotic motions from both the CMU motion capture database and YouTube videos.
arXiv Detail & Related papers (2021-03-11T04:17:26Z) - Careful with That! Observation of Human Movements to Estimate Objects
Properties [106.925705883949]
We focus on the features of human motor actions that communicate insights on the weight of an object.
Our final goal is to enable a robot to autonomously infer the degree of care required in object handling.
arXiv Detail & Related papers (2021-03-02T08:14:56Z) - Perceiving Humans: from Monocular 3D Localization to Social Distancing [93.03056743850141]
We present a new cost-effective vision-based method that perceives humans' locations in 3D and their body orientation from a single image.
We show that it is possible to rethink the concept of "social distancing" as a form of social interaction in contrast to a simple location-based rule.
arXiv Detail & Related papers (2020-09-01T10:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.