Action-conditioned Deep Visual Prediction with RoAM, a new Indoor Human
Motion Dataset for Autonomous Robots
- URL: http://arxiv.org/abs/2306.15852v1
- Date: Wed, 28 Jun 2023 00:58:44 GMT
- Title: Action-conditioned Deep Visual Prediction with RoAM, a new Indoor Human
Motion Dataset for Autonomous Robots
- Authors: Meenakshi Sarkar, Vinayak Honkote, Dibyendu Das and Debasish Ghose
- Abstract summary: We introduce the Robot Autonomous Motion (RoAM) video dataset.
It is collected with a custom-made turtlebot3 Burger robot in a variety of indoor environments recording various human motions from the robot's ego-vision.
The dataset also includes synchronized records of the LiDAR scan and all control actions taken by the robot as it navigates around static and moving human agents.
- Score: 1.7778609937758327
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the increasing adoption of robots across industries, it is crucial to
focus on developing advanced algorithms that enable robots to anticipate,
comprehend, and plan their actions effectively in collaboration with humans. We
introduce the Robot Autonomous Motion (RoAM) video dataset, which is collected
with a custom-made turtlebot3 Burger robot in a variety of indoor environments
recording various human motions from the robot's ego-vision. The dataset also
includes synchronized records of the LiDAR scan and all control actions taken
by the robot as it navigates around static and moving human agents. The unique
dataset provides an opportunity to develop and benchmark new visual prediction
frameworks that can predict future image frames based on the action taken by
the recording agent in partially observable scenarios or cases where the
imaging sensor is mounted on a moving platform. We have benchmarked the dataset
on our novel deep visual prediction framework called ACPNet where the
approximated future image frames are also conditioned on action taken by the
robot and demonstrated its potential for incorporating robot dynamics into the
video prediction paradigm for mobile robotics and autonomous navigation
research.
Related papers
- Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation [65.46610405509338]
We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation.
Our framework,Track2Act predicts tracks of how points in an image should move in future time-steps based on a goal.
We show that this approach of combining scalably learned track prediction with a residual policy enables diverse generalizable robot manipulation.
arXiv Detail & Related papers (2024-05-02T17:56:55Z) - Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset [52.22758311559]
We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot.
The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors.
The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users.
arXiv Detail & Related papers (2024-03-21T14:53:50Z) - Robot Interaction Behavior Generation based on Social Motion Forecasting for Human-Robot Interaction [9.806227900768926]
We propose to model social motion forecasting in a shared human-robot representation space.
ECHO operates in the aforementioned shared space to predict the future motions of the agents encountered in social scenarios.
We evaluate our model in multi-person and human-robot motion forecasting tasks and obtain state-of-the-art performance by a large margin.
arXiv Detail & Related papers (2024-02-07T11:37:14Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Combining Vision and Tactile Sensation for Video Prediction [0.0]
We investigate the impact of integrating tactile feedback into video prediction models for physical robot interactions.
We introduce two new datasets of robot pushing that use a magnetic-based tactile sensor for unsupervised learning.
Our results demonstrate that incorporating tactile feedback into video prediction models improves scene prediction accuracy and enhances the agent's perception of physical interactions.
arXiv Detail & Related papers (2023-04-21T18:02:15Z) - HabitatDyn Dataset: Dynamic Object Detection to Kinematics Estimation [16.36110033895749]
We propose the dataset HabitatDyn, which contains both synthetic RGB videos, semantic labels, and depth information, as well as kinetics information.
HabitatDyn was created from the perspective of a mobile robot with a moving camera, and contains 30 scenes featuring six different types of moving objects with varying velocities.
arXiv Detail & Related papers (2023-04-21T09:57:35Z) - Zero-Shot Robot Manipulation from Passive Human Videos [59.193076151832145]
We develop a framework for extracting agent-agnostic action representations from human videos.
Our framework is based on predicting plausible human hand trajectories.
We deploy the trained model zero-shot for physical robot manipulation tasks.
arXiv Detail & Related papers (2023-02-03T21:39:52Z) - Future Frame Prediction for Robot-assisted Surgery [57.18185972461453]
We propose a ternary prior guided variational autoencoder (TPG-VAE) model for future frame prediction in robotic surgical video sequences.
Besides content distribution, our model learns motion distribution, which is novel to handle the small movements of surgical tools.
arXiv Detail & Related papers (2021-03-18T15:12:06Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.