GRILLBot: An Assistant for Real-World Tasks with Neural Semantic Parsing
and Graph-Based Representations
- URL: http://arxiv.org/abs/2208.14884v1
- Date: Wed, 31 Aug 2022 14:24:35 GMT
- Title: GRILLBot: An Assistant for Real-World Tasks with Neural Semantic Parsing
and Graph-Based Representations
- Authors: Carlos Gemmell, Iain Mackie, Paul Owoicho, Federico Rossetto, Sophie
Fischer, Jeffrey Dalton
- Abstract summary: GRILLBot is the winning system in the 2022 Alexa Prize TaskBot Challenge.
It is a voice assistant to guide users through complex real-world tasks in the domains of cooking and home improvement.
- Score: 5.545791216381869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: GRILLBot is the winning system in the 2022 Alexa Prize TaskBot Challenge,
moving towards the next generation of multimodal task assistants. It is a voice
assistant to guide users through complex real-world tasks in the domains of
cooking and home improvement. These are long-running and complex tasks that
require flexible adjustment and adaptation. The demo highlights the core
aspects, including a novel Neural Decision Parser for contextualized semantic
parsing, a new "TaskGraph" state representation that supports conditional
execution, knowledge-grounded chit-chat, and automatic enrichment of tasks with
images and videos.
Related papers
- VideoGUI: A Benchmark for GUI Automation from Instructional Videos [78.97292966276706]
VideoGUI is a novel multi-modal benchmark designed to evaluate GUI assistants on visual-centric GUI tasks.
Sourced from high-quality web instructional videos, our benchmark focuses on tasks involving professional and novel software.
Our evaluation reveals that even the SoTA large multimodal model GPT4o performs poorly on visual-centric GUI tasks.
arXiv Detail & Related papers (2024-06-14T17:59:08Z) - Video Task Decathlon: Unifying Image and Video Tasks in Autonomous
Driving [85.62076860189116]
Video Task Decathlon (VTD) includes ten representative image and video tasks spanning classification, segmentation, localization, and association of objects and pixels.
We develop our unified network, VTDNet, that uses a single structure and a single set of weights for all ten tasks.
arXiv Detail & Related papers (2023-09-08T16:33:27Z) - RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in
One-Shot [56.130215236125224]
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots.
Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations.
This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception.
arXiv Detail & Related papers (2023-07-02T15:33:31Z) - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging
Face [85.25054021362232]
Large language models (LLMs) have exhibited exceptional abilities in language understanding, generation, interaction, and reasoning.
LLMs could act as a controller to manage existing AI models to solve complicated AI tasks.
We present HuggingGPT, an LLM-powered agent that connects various AI models in machine learning communities.
arXiv Detail & Related papers (2023-03-30T17:48:28Z) - Alexa, Let's Work Together: Introducing the First Alexa Prize TaskBot
Challenge on Conversational Task Assistance [22.3267314621785]
The Alexa Prize TaskBot challenge builds on the success of the SocialBot challenge by introducing the requirements of interactively assisting humans with real-world tasks.
This paper provides an overview of the TaskBot challenge, describes the infrastructure support provided to the teams with the CoBot Toolkit, and summarizes the approaches the participating teams took to overcome the research challenges.
arXiv Detail & Related papers (2022-09-13T22:01:42Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering [43.07139534653485]
We present Answer-Me, a task-aware multi-task framework.
We pre-train a vision-language joint model, which is multi-task as well.
Results show state-of-the-art performance, zero-shot generalization, robustness to forgetting, and competitive single-task results.
arXiv Detail & Related papers (2022-05-02T14:53:13Z) - One-Shot Learning from a Demonstration with Hierarchical Latent Language [43.140223608960554]
We introduce DescribeWorld, an environment designed to test this sort of generalization skill in grounded agents.
The agent observes a single task demonstration in a Minecraft-like grid world, and is then asked to carry out the same task in a new map.
We find that agents that perform text-based inference are better equipped for the challenge under a random split of tasks.
arXiv Detail & Related papers (2022-03-09T15:36:43Z) - VSGM -- Enhance robot task understanding ability through visual semantic
graph [0.0]
We consider that giving robots an understanding of visual semantics and language semantics will improve inference ability.
In this paper, we propose a novel method-VSGM, which uses the semantic graph to obtain better visual image features.
arXiv Detail & Related papers (2021-05-19T07:22:31Z) - Modeling Long-horizon Tasks as Sequential Interaction Landscapes [75.5824586200507]
We present a deep learning network that learns dependencies and transitions across subtasks solely from a set of demonstration videos.
We show that these symbols can be learned and predicted directly from image observations.
We evaluate our framework on two long horizon tasks: (1) block stacking of puzzle pieces being executed by humans, and (2) a robot manipulation task involving pick and place of objects and sliding a cabinet door with a 7-DoF robot arm.
arXiv Detail & Related papers (2020-06-08T18:07:18Z) - Deep Multi-Task Augmented Feature Learning via Hierarchical Graph Neural
Network [4.121467410954028]
We propose a Hierarchical Graph Neural Network to learn augmented features for deep multi-task learning.
Experiments on real-world datastes show the significant performance improvement when using this strategy.
arXiv Detail & Related papers (2020-02-12T06:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.