Continual Learning from Synthetic Data for a Humanoid Exercise Robot
- URL: http://arxiv.org/abs/2102.10034v1
- Date: Fri, 19 Feb 2021 17:05:25 GMT
- Title: Continual Learning from Synthetic Data for a Humanoid Exercise Robot
- Authors: Nicolas Duczek, Matthias Kerzel, Stefan Wermter
- Abstract summary: In a practical scenario, a physical exercise is performed by an expert like a physiotherapist and then used as a reference for a humanoid robot like Pepper to give feedback on a patient's execution of the same exercise.
This paper tackles the first challenge by designing an architecture that allows for tolerances in translation and rotations regarding the center of the field of view.
For the second challenge, we allow the GWR to grow online on incremental data. For evaluation, we created a novel exercise dataset with virtual avatars called the Virtual-Squat dataset.
- Score: 15.297262564198972
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In order to detect and correct physical exercises, a Grow-When-Required
Network (GWR) with recurrent connections, episodic memory and a novel subnode
mechanism is developed in order to learn spatiotemporal relationships of body
movements and poses. Once an exercise is performed, the information of pose and
movement per frame is stored in the GWR. For every frame, the current pose and
motion pair is compared against a predicted output of the GWR, allowing for
feedback not only on the pose but also on the velocity of the motion. In a
practical scenario, a physical exercise is performed by an expert like a
physiotherapist and then used as a reference for a humanoid robot like Pepper
to give feedback on a patient's execution of the same exercise. This approach,
however, comes with two challenges. First, the distance from the humanoid robot
and the position of the user in the camera's view of the humanoid robot have to
be considered by the GWR as well, requiring a robustness against the user's
positioning in the field of view of the humanoid robot. Second, since both the
pose and motion are dependent on the body measurements of the original
performer, the expert's exercise cannot be easily used as a reference. This
paper tackles the first challenge by designing an architecture that allows for
tolerances in translation and rotations regarding the center of the field of
view. For the second challenge, we allow the GWR to grow online on incremental
data. For evaluation, we created a novel exercise dataset with virtual avatars
called the Virtual-Squat dataset. Overall, we claim that our novel architecture
based on the GWR can use a learned exercise reference for different body
variations through continual online learning, while preventing catastrophic
forgetting, enabling for an engaging long-term human-robot interaction with a
humanoid robot.
Related papers
- Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - Robot Interaction Behavior Generation based on Social Motion Forecasting for Human-Robot Interaction [9.806227900768926]
We propose to model social motion forecasting in a shared human-robot representation space.
ECHO operates in the aforementioned shared space to predict the future motions of the agents encountered in social scenarios.
We evaluate our model in multi-person and human-robot motion forecasting tasks and obtain state-of-the-art performance by a large margin.
arXiv Detail & Related papers (2024-02-07T11:37:14Z) - CARPE-ID: Continuously Adaptable Re-identification for Personalized
Robot Assistance [16.948256303861022]
In today's Human-Robot Interaction (HRI) scenarios, a prevailing tendency exists to assume that the robot shall cooperate with the closest individual.
We propose a person re-identification module based on continual visual adaptation techniques.
We test the framework singularly using recorded videos in a laboratory environment and an HRI scenario by a mobile robot.
arXiv Detail & Related papers (2023-10-30T10:24:21Z) - ImitationNet: Unsupervised Human-to-Robot Motion Retargeting via Shared Latent Space [9.806227900768926]
This paper introduces a novel deep-learning approach for human-to-robot motion.
Our method does not require paired human-to-robot data, which facilitates its translation to new robots.
Our model outperforms existing works regarding human-to-robot similarity in terms of efficiency and precision.
arXiv Detail & Related papers (2023-09-11T08:55:04Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Skeleton2Humanoid: Animating Simulated Characters for
Physically-plausible Motion In-betweening [59.88594294676711]
Modern deep learning based motion synthesis approaches barely consider the physical plausibility of synthesized motions.
We propose a system Skeleton2Humanoid'' which performs physics-oriented motion correction at test time.
Experiments on the challenging LaFAN1 dataset show our system can outperform prior methods significantly in terms of both physical plausibility and accuracy.
arXiv Detail & Related papers (2022-10-09T16:15:34Z) - QuestSim: Human Motion Tracking from Sparse Sensors with Simulated
Avatars [80.05743236282564]
Real-time tracking of human body motion is crucial for immersive experiences in AR/VR.
We present a reinforcement learning framework that takes in sparse signals from an HMD and two controllers.
We show that a single policy can be robust to diverse locomotion styles, different body sizes, and novel environments.
arXiv Detail & Related papers (2022-09-20T00:25:54Z) - Occlusion-Robust Multi-Sensory Posture Estimation in Physical
Human-Robot Interaction [10.063075560468798]
2D postures from OpenPose over a single camera, and the trajectory of the interacting robot while the human performs a task.
We use 2D postures from OpenPose over a single camera, and the trajectory of the interacting robot while the human performs a task.
We show that our multi-sensory system resolves human kinematic redundancy better than posture estimation solely using OpenPose or posture estimation solely using the robot's trajectory.
arXiv Detail & Related papers (2022-08-12T20:41:09Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Few-Shot Visual Grounding for Natural Human-Robot Interaction [0.0]
We propose a software architecture that segments a target object from a crowded scene, indicated verbally by a human user.
At the core of our system, we employ a multi-modal deep neural network for visual grounding.
We evaluate the performance of the proposed model on real RGB-D data collected from public scene datasets.
arXiv Detail & Related papers (2021-03-17T15:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.