Talk-to-Resolve: Combining scene understanding and spatial dialogue to
resolve granular task ambiguity for a collocated robot
- URL: http://arxiv.org/abs/2111.11099v1
- Date: Mon, 22 Nov 2021 10:42:59 GMT
- Title: Talk-to-Resolve: Combining scene understanding and spatial dialogue to
resolve granular task ambiguity for a collocated robot
- Authors: Pradip Pramanick, Chayan Sarkar, Snehasis Banerjee, Brojeshwar
Bhowmick
- Abstract summary: The utility of collocating robots largely depends on the easy and intuitive interaction mechanism with the human.
We present a system called Talk-to-Resolve (TTR) that enables a robot to initiate a coherent dialogue exchange with the instructor.
Our system can identify the stalemate and resolve them with appropriate dialogue exchange with 82% accuracy.
- Score: 15.408128612723882
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The utility of collocating robots largely depends on the easy and intuitive
interaction mechanism with the human. If a robot accepts task instruction in
natural language, first, it has to understand the user's intention by decoding
the instruction. However, while executing the task, the robot may face
unforeseeable circumstances due to the variations in the observed scene and
therefore requires further user intervention. In this article, we present a
system called Talk-to-Resolve (TTR) that enables a robot to initiate a coherent
dialogue exchange with the instructor by observing the scene visually to
resolve the impasse. Through dialogue, it either finds a cue to move forward in
the original plan, an acceptable alternative to the original plan, or
affirmation to abort the task altogether. To realize the possible stalemate, we
utilize the dense captions of the observed scene and the given instruction
jointly to compute the robot's next action. We evaluate our system based on a
data set of initial instruction and situational scene pairs. Our system can
identify the stalemate and resolve them with appropriate dialogue exchange with
82% accuracy. Additionally, a user study reveals that the questions from our
systems are more natural (4.02 on average on a scale of 1 to 5) as compared to
a state-of-the-art (3.08 on average).
Related papers
- Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption.
SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning.
We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z) - Real-time Addressee Estimation: Deployment of a Deep-Learning Model on
the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans.
Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot.
The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z) - Dobby: A Conversational Service Robot Driven by GPT-4 [22.701223191699412]
This work introduces a robotics platform which embeds a conversational AI agent in an embodied system for service tasks.
The agent is derived from a large language model, which has learned from a vast corpus of general knowledge.
In addition to generating dialogue, this agent can interface with the physical world by invoking commands on the robot.
arXiv Detail & Related papers (2023-10-10T04:34:00Z) - Proactive Human-Robot Interaction using Visuo-Lingual Transformers [0.0]
Humans possess the innate ability to extract latent visuo-lingual cues to infer context through human interaction.
We propose a learning-based method that uses visual cues from the scene, lingual commands from a user and knowledge of prior object-object interaction to identify and proactively predict the underlying goal the user intends to achieve.
arXiv Detail & Related papers (2023-10-04T00:50:21Z) - "No, to the Right" -- Online Language Corrections for Robotic
Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution.
Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot.
We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z) - Instruction-driven history-aware policies for robotic manipulations [82.25511767738224]
We propose a unified transformer-based approach that takes into account multiple inputs.
In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations.
We evaluate our method on the challenging RLBench benchmark and on a real-world robot.
arXiv Detail & Related papers (2022-09-11T16:28:25Z) - Correcting Robot Plans with Natural Language Feedback [88.92824527743105]
We explore natural language as an expressive and flexible tool for robot correction.
We show that these transformations enable users to correct goals, update robot motions, and recover from planning errors.
Our method makes it possible to compose multiple constraints and generalizes to unseen scenes, objects, and sentences in simulated environments and real-world environments.
arXiv Detail & Related papers (2022-04-11T15:22:43Z) - BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning [108.41464483878683]
We study the problem of enabling a vision-based robotic manipulation system to generalize to novel tasks.
We develop an interactive and flexible imitation learning system that can learn from both demonstrations and interventions.
When scaling data collection on a real robot to more than 100 distinct tasks, we find that this system can perform 24 unseen manipulation tasks with an average success rate of 44%.
arXiv Detail & Related papers (2022-02-04T07:30:48Z) - Scene Editing as Teleoperation: A Case Study in 6DoF Kit Assembly [18.563562557565483]
We propose the framework "Scene Editing as Teleoperation" (SEaT)
Instead of controlling the robot, users focus on specifying the task's goal.
A user can perform teleoperation without any expert knowledge of the robot hardware.
arXiv Detail & Related papers (2021-10-09T04:22:21Z) - Dialogue Object Search [11.431837357827396]
We introduce a new task, dialogue object search: A robot is tasked to search for a target object in a human environment.
The robot conducts speech-based dialogue with the human, while sharing the image from its mounted camera.
This task is challenging at multiple levels, from data collection, algorithm and system development,to evaluation.
arXiv Detail & Related papers (2021-07-22T13:32:14Z) - Composing Pick-and-Place Tasks By Grounding Language [41.075844857146805]
We present a robot system that follows unconstrained language instructions to pick and place arbitrary objects.
Our approach infers objects and their relationships from input images and language expressions.
Results obtained using a real-world PR2 robot demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2021-02-16T11:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.