Related papers: Dialogue Object Search

Dialogue Object Search

URL: http://arxiv.org/abs/2107.10653v1
Date: Thu, 22 Jul 2021 13:32:14 GMT
Title: Dialogue Object Search
Authors: Monica Roy, Kaiyu Zheng, Jason Liu, Stefanie Tellex
Abstract summary: We introduce a new task, dialogue object search: A robot is tasked to search for a target object in a human environment. The robot conducts speech-based dialogue with the human, while sharing the image from its mounted camera. This task is challenging at multiple levels, from data collection, algorithm and system development,to evaluation.
Score: 11.431837357827396
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We envision robots that can collaborate and communicate seamlessly with humans. It is necessary for such robots to decide both what to say and how to act, while interacting with humans. To this end, we introduce a new task, dialogue object search: A robot is tasked to search for a target object (e.g. fork) in a human environment (e.g., kitchen), while engaging in a "video call" with a remote human who has additional but inexact knowledge about the target's location. That is, the robot conducts speech-based dialogue with the human, while sharing the image from its mounted camera. This task is challenging at multiple levels, from data collection, algorithm and system development,to evaluation. Despite these challenges, we believe such a task blocks the path towards more intelligent and collaborative robots. In this extended abstract, we motivate and introduce the dialogue object search task and analyze examples collected from a pilot study. We then discuss our next steps and conclude with several challenges on which we hope to receive feedback.

Related papers

Imitation of human motion achieves natural head movements for humanoid robots in an active-speaker detection task [2.8220015774219567]
Head movements are crucial for social human-human interaction. In this work, we employed a generative AI pipeline to produce human-like head movements for a Nao humanoid robot. The results show that the Nao robot successfully imitates human head movements in a natural manner while actively tracking the speakers during the conversation.
arXiv Detail & Related papers (2024-07-16T17:08:40Z)
Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots [119.55240471433302]
Habitat 3.0 is a simulation platform for studying collaborative human-robot tasks in home environments. It addresses challenges in modeling complex deformable bodies and diversity in appearance and motion. Human-in-the-loop infrastructure enables real human interaction with simulated robots via mouse/keyboard or a VR interface.
arXiv Detail & Related papers (2023-10-19T17:29:17Z)
HandMeThat: Human-Robot Communication in Physical and Social Environments [73.91355172754717]
HandMeThat is a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments. HandMeThat contains 10,000 episodes of human-robot interactions. We show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat.
arXiv Detail & Related papers (2023-10-05T16:14:46Z)
WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system. We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration. We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z)
Affordances from Human Videos as a Versatile Representation for Robotics [31.248842798600606]
We train a visual affordance model that estimates where and how in the scene a human is likely to interact. The structure of these behavioral affordances directly enables the robot to perform many complex tasks. We show the efficacy of our approach, which we call VRB, across 4 real world environments, over 10 different tasks, and 2 robotic platforms operating in the wild.
arXiv Detail & Related papers (2023-04-17T17:59:34Z)
Robots with Different Embodiments Can Express and Influence Carefulness in Object Manipulation [104.5440430194206]
This work investigates the perception of object manipulations performed with a communicative intent by two robots. We designed the robots' movements to communicate carefulness or not during the transportation of objects.
arXiv Detail & Related papers (2022-08-03T13:26:52Z)
Talk-to-Resolve: Combining scene understanding and spatial dialogue to resolve granular task ambiguity for a collocated robot [15.408128612723882]
The utility of collocating robots largely depends on the easy and intuitive interaction mechanism with the human. We present a system called Talk-to-Resolve (TTR) that enables a robot to initiate a coherent dialogue exchange with the instructor. Our system can identify the stalemate and resolve them with appropriate dialogue exchange with 82% accuracy.
arXiv Detail & Related papers (2021-11-22T10:42:59Z)
Let's be friends! A rapport-building 3D embodied conversational agent for the Human Support Robot [0.0]
Partial subtle mirroring of nonverbal behaviors during conversations (also known as mimicking or parallel empathy) is essential for rapport building. Our research question is whether integrating an ECA able to mirror its interlocutor's facial expressions and head movements with a human-service robot will improve the user's experience. Our contribution is the complex integration of an expressive ECA, able to track its interlocutor's face, and to mirror his/her facial expressions and head movements in real time, integrated with a human support robot.
arXiv Detail & Related papers (2021-03-08T01:02:41Z)
Joint Mind Modeling for Explanation Generation in Complex Human-Robot Collaborative Tasks [83.37025218216888]
We propose a novel explainable AI (XAI) framework for achieving human-like communication in human-robot collaborations. The robot builds a hierarchical mind model of the human user and generates explanations of its own mind as a form of communications. Results show that the generated explanations of our approach significantly improves the collaboration performance and user perception of the robot.
arXiv Detail & Related papers (2020-07-24T23:35:03Z)
SAPIEN: A SimulAted Part-based Interactive ENvironment [77.4739790629284]
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects. We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks.
arXiv Detail & Related papers (2020-03-19T00:11:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.