Learning by Watching: A Review of Video-based Learning Approaches for
Robot Manipulation
- URL: http://arxiv.org/abs/2402.07127v1
- Date: Sun, 11 Feb 2024 08:41:42 GMT
- Title: Learning by Watching: A Review of Video-based Learning Approaches for
Robot Manipulation
- Authors: Chrisantus Eze and Christopher Crick
- Abstract summary: Recent works have explored learning manipulation skills by passively watching abundant videos sourced online.
This survey reviews foundations such as video feature representation learning techniques, object affordance understanding, 3D hand/body modeling, and large-scale robot resources.
We discuss how learning only from observing large-scale human videos can enhance generalization and sample efficiency for robotic manipulation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robot learning of manipulation skills is hindered by the scarcity of diverse,
unbiased datasets. While curated datasets can help, challenges remain in
generalizability and real-world transfer. Meanwhile, large-scale "in-the-wild"
video datasets have driven progress in computer vision through self-supervised
techniques. Translating this to robotics, recent works have explored learning
manipulation skills by passively watching abundant videos sourced online.
Showing promising results, such video-based learning paradigms provide scalable
supervision while reducing dataset bias. This survey reviews foundations such
as video feature representation learning techniques, object affordance
understanding, 3D hand/body modeling, and large-scale robot resources, as well
as emerging techniques for acquiring robot manipulation skills from
uncontrolled video demonstrations. We discuss how learning only from observing
large-scale human videos can enhance generalization and sample efficiency for
robotic manipulation. The survey summarizes video-based learning approaches,
analyses their benefits over standard datasets, survey metrics, and benchmarks,
and discusses open challenges and future directions in this nascent domain at
the intersection of computer vision, natural language processing, and robot
learning.
Related papers
- A Survey of Embodied Learning for Object-Centric Robotic Manipulation [27.569063968870868]
Embodied learning for object-centric robotic manipulation is a rapidly developing and challenging area in AI.
Unlike data-driven machine learning methods, embodied learning focuses on robot learning through physical interaction with the environment.
arXiv Detail & Related papers (2024-08-21T11:32:09Z) - VITAL: Visual Teleoperation to Enhance Robot Learning through Human-in-the-Loop Corrections [10.49712834719005]
We propose a low-cost visual teleoperation system for bimanual manipulation tasks, called VITAL.
Our approach leverages affordable hardware and visual processing techniques to collect demonstrations.
We enhance the generalizability and robustness of the learned policies by utilizing both real and simulated environments.
arXiv Detail & Related papers (2024-07-30T23:29:47Z) - Towards Generalist Robot Learning from Internet Video: A Survey [56.621902345314645]
Scaling deep learning to huge internet-scraped datasets has yielded remarkably general capabilities in natural language processing and visual understanding and generation.
Data is scarce and expensive to collect in robotics. This has seen robot learning struggle to match the generality of capabilities observed in other domains.
Learning from Videos (LfV) methods seek to address this data bottleneck by augmenting traditional robot data with large internet-scraped video datasets.
arXiv Detail & Related papers (2024-04-30T15:57:41Z) - Exploring Visual Pre-training for Robot Manipulation: Datasets, Models
and Methods [14.780597545674157]
We investigate the effects of visual pre-training strategies on robot manipulation tasks from three fundamental perspectives.
We propose a visual pre-training scheme for robot manipulation termed Vi-PRoM, which combines self-supervised learning and supervised learning.
arXiv Detail & Related papers (2023-08-07T14:24:52Z) - RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in
One-Shot [56.130215236125224]
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots.
Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations.
This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception.
arXiv Detail & Related papers (2023-07-02T15:33:31Z) - Scaling Robot Learning with Semantically Imagined Experience [21.361979238427722]
Recent advances in robot learning have shown promise in enabling robots to perform manipulation tasks.
One of the key contributing factors to this progress is the scale of robot data used to train the models.
We propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing.
arXiv Detail & Related papers (2023-02-22T18:47:51Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z) - Actionable Models: Unsupervised Offline Reinforcement Learning of
Robotic Skills [93.12417203541948]
We propose the objective of learning a functional understanding of the environment by learning to reach any goal state in a given dataset.
We find that our method can operate on high-dimensional camera images and learn a variety of skills on real robots that generalize to previously unseen scenes and objects.
arXiv Detail & Related papers (2021-04-15T20:10:11Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.