Learning Visualization Policies of Augmented Reality for Human-Robot
Collaboration
- URL: http://arxiv.org/abs/2211.07028v1
- Date: Sun, 13 Nov 2022 22:03:20 GMT
- Title: Learning Visualization Policies of Augmented Reality for Human-Robot
Collaboration
- Authors: Kishan Chandan, Jack Albertson, Shiqi Zhang
- Abstract summary: In human-robot collaboration domains, augmented reality (AR) technologies have enabled people to visualize the state of robots.
Current AR-based visualization policies are designed manually, which requires a lot of human efforts and domain knowledge.
We develop a framework, called VARIL, that enables AR agents to learn visualization policies from demonstrations.
- Score: 5.400491728405083
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In human-robot collaboration domains, augmented reality (AR) technologies
have enabled people to visualize the state of robots. Current AR-based
visualization policies are designed manually, which requires a lot of human
efforts and domain knowledge. When too little information is visualized, human
users find the AR interface not useful; when too much information is
visualized, they find it difficult to process the visualized information. In
this paper, we develop a framework, called VARIL, that enables AR agents to
learn visualization policies (what to visualize, when, and how) from
demonstrations. We created a Unity-based platform for simulating warehouse
environments where human-robot teammates collaborate on delivery tasks. We have
collected a dataset that includes demonstrations of visualizing robots' current
and planned behaviors. Results from experiments with real human participants
show that, compared with competitive baselines from the literature, our learned
visualization strategies significantly increase the efficiency of human-robot
teams, while reducing the distraction level of human users. VARIL has been
demonstrated in a built-in-lab mock warehouse.
Related papers
- Improving Visual Perception of a Social Robot for Controlled and
In-the-wild Human-robot Interaction [10.260966795508569]
It is unclear how will the objective interaction performance and subjective user experience be influenced when a social robot adopts a deep-learning based visual perception model.
We employ state-of-the-art human perception and tracking models to improve the visual perception function of the Pepper robot.
arXiv Detail & Related papers (2024-03-04T06:47:06Z) - Voila-A: Aligning Vision-Language Models with User's Gaze Attention [56.755993500556734]
We introduce gaze information as a proxy for human attention to guide Vision-Language Models (VLMs)
We propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications.
arXiv Detail & Related papers (2023-12-22T17:34:01Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in
One-Shot [56.130215236125224]
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots.
Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations.
This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception.
arXiv Detail & Related papers (2023-07-02T15:33:31Z) - Affordances from Human Videos as a Versatile Representation for Robotics [31.248842798600606]
We train a visual affordance model that estimates where and how in the scene a human is likely to interact.
The structure of these behavioral affordances directly enables the robot to perform many complex tasks.
We show the efficacy of our approach, which we call VRB, across 4 real world environments, over 10 different tasks, and 2 robotic platforms operating in the wild.
arXiv Detail & Related papers (2023-04-17T17:59:34Z) - A Benchmark for Compositional Visual Reasoning [5.576460160219606]
We introduce a novel visual reasoning benchmark, Compositional Visual Relations (CVR), to drive progress towards more data-efficient learning algorithms.
We take inspiration from fluidic intelligence and non-verbal reasoning tests and describe a novel method for creating compositions of abstract rules and associated image datasets at scale.
Our proposed benchmark includes measures of sample efficiency, generalization and transfer across task rules, as well as the ability to leverage compositionality.
arXiv Detail & Related papers (2022-06-11T00:04:49Z) - An Augmented Reality Platform for Introducing Reinforcement Learning to
K-12 Students with Robots [10.835598738100359]
We propose an Augmented Reality (AR) system that reveals the hidden state of the learning to the human users.
This paper describes our system's design and implementation and concludes with a discussion on two directions for future work.
arXiv Detail & Related papers (2021-10-10T03:51:39Z) - Learning Visually Guided Latent Actions for Assistive Teleoperation [9.75385535829762]
We develop assistive robots that condition their latent embeddings on visual inputs.
We show that incorporating object detectors pretrained on small amounts of cheap, easy-to-collect structured data enables i) accurately recognizing the current context and ii) generalizing control embeddings to new objects and tasks.
arXiv Detail & Related papers (2021-05-02T23:58:28Z) - Cognitive architecture aided by working-memory for self-supervised
multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions.
Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task.
One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z) - A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels.
We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.