Teaching Unknown Objects by Leveraging Human Gaze and Augmented Reality
in Human-Robot Interaction
- URL: http://arxiv.org/abs/2312.07638v1
- Date: Tue, 12 Dec 2023 11:34:43 GMT
- Title: Teaching Unknown Objects by Leveraging Human Gaze and Augmented Reality
in Human-Robot Interaction
- Authors: Daniel Weber
- Abstract summary: This dissertation aims to teach a robot unknown objects in the context of Human-Robot Interaction (HRI)
The combination of eye tracking and Augmented Reality created a powerful synergy that empowered the human teacher to communicate with the robot.
The robot's object detection capabilities exhibited comparable performance to state-of-the-art object detectors trained on extensive datasets.
- Score: 3.1473798197405953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robots are becoming increasingly popular in a wide range of environments due
to their exceptional work capacity, precision, efficiency, and scalability.
This development has been further encouraged by advances in Artificial
Intelligence, particularly Machine Learning. By employing sophisticated neural
networks, robots are given the ability to detect and interact with objects in
their vicinity. However, a significant drawback arises from the underlying
dependency on extensive datasets and the availability of substantial amounts of
training data for these object detection models. This issue becomes
particularly problematic when the specific deployment location of the robot and
the surroundings, are not known in advance. The vast and ever-expanding array
of objects makes it virtually impossible to comprehensively cover the entire
spectrum of existing objects using preexisting datasets alone. The goal of this
dissertation was to teach a robot unknown objects in the context of Human-Robot
Interaction (HRI) in order to liberate it from its data dependency, unleashing
it from predefined scenarios. In this context, the combination of eye tracking
and Augmented Reality created a powerful synergy that empowered the human
teacher to communicate with the robot and effortlessly point out objects by
means of human gaze. This holistic approach led to the development of a
multimodal HRI system that enabled the robot to identify and visually segment
the Objects of Interest in 3D space. Through the class information provided by
the human, the robot was able to learn the objects and redetect them at a later
stage. Due to the knowledge gained from this HRI based teaching, the robot's
object detection capabilities exhibited comparable performance to
state-of-the-art object detectors trained on extensive datasets, without being
restricted to predefined classes, showcasing its versatility and adaptability.
Related papers
- RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics [26.42651735582044]
We introduce RoboSpatial, a large-scale spatial understanding dataset consisting of real indoor and tabletop scenes captured as 3D scans and egocentric images annotated with rich spatial information relevant to robotics.
Our experiments show that models trained with RoboSpatial outperform baselines on downstream tasks such as spatial affordance prediction, spatial relationship prediction, and robotics manipulation.
arXiv Detail & Related papers (2024-11-25T16:21:34Z) - Learning Object Properties Using Robot Proprioception via Differentiable Robot-Object Interaction [52.12746368727368]
Differentiable simulation has become a powerful tool for system identification.
Our approach calibrates object properties by using information from the robot, without relying on data from the object itself.
We demonstrate the effectiveness of our method on a low-cost robotic platform.
arXiv Detail & Related papers (2024-10-04T20:48:38Z) - A Survey of Embodied Learning for Object-Centric Robotic Manipulation [27.569063968870868]
Embodied learning for object-centric robotic manipulation is a rapidly developing and challenging area in AI.
Unlike data-driven machine learning methods, embodied learning focuses on robot learning through physical interaction with the environment.
arXiv Detail & Related papers (2024-08-21T11:32:09Z) - NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot
Learning in Natural Human-Robot Interaction [19.65778558341053]
Speech-gesture HRI datasets often focus on elementary tasks, like object pointing and pushing.
We introduce NatSGD, a multimodal HRI dataset encompassing human commands through speech and gestures.
We demonstrate its effectiveness in training robots to understand tasks through multimodal human commands.
arXiv Detail & Related papers (2024-03-04T18:02:41Z) - Robo-ABC: Affordance Generalization Beyond Categories via Semantic
Correspondence for Robot Manipulation [20.69293648286978]
We present Robo-ABC, a framework for robotic manipulation that generalizes to out-of-distribution scenes.
We show that Robo-ABC significantly enhances the accuracy of visual affordance retrieval by a large margin.
Robo-ABC achieved a success rate of 85.7%, proving its capacity for real-world tasks.
arXiv Detail & Related papers (2024-01-15T06:02:30Z) - FOCUS: Object-Centric World Models for Robotics Manipulation [4.6956495676681484]
FOCUS is a model-based agent that learns an object-centric world model.
We show that object-centric world models allow the agent to solve tasks more efficiently.
We also showcase how FOCUS could be adopted in real-world settings.
arXiv Detail & Related papers (2023-07-05T16:49:06Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human
Videos [59.58105314783289]
Domain-agnostic Video Discriminator (DVD) learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task.
DVD can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos.
DVD can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.
arXiv Detail & Related papers (2021-03-31T05:25:05Z) - Cognitive architecture aided by working-memory for self-supervised
multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions.
Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task.
One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z) - Task-relevant Representation Learning for Networked Robotic Perception [74.0215744125845]
This paper presents an algorithm to learn task-relevant representations of sensory data that are co-designed with a pre-trained robotic perception model's ultimate objective.
Our algorithm aggressively compresses robotic sensory data by up to 11x more than competing methods.
arXiv Detail & Related papers (2020-11-06T07:39:08Z) - SAPIEN: A SimulAted Part-based Interactive ENvironment [77.4739790629284]
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects.
We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks.
arXiv Detail & Related papers (2020-03-19T00:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.