Interactive Mobile App Navigation with Uncertain or Under-specified
Natural Language Commands
- URL: http://arxiv.org/abs/2202.02312v1
- Date: Fri, 4 Feb 2022 18:51:50 GMT
- Title: Interactive Mobile App Navigation with Uncertain or Under-specified
Natural Language Commands
- Authors: Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate
Saenko, Bryan A. Plummer
- Abstract summary: We introduce Mobile app Tasks with Iterative Feedback (MoTIF), a new dataset where the goal is to complete a natural language query in a mobile app.
Current datasets for related tasks in interactive question answering, visual common sense reasoning, and question-answer plausibility prediction do not support research in resolving ambiguous natural language requests.
MoTIF contains natural language requests that are not satisfiable, the first such work to investigate this issue for interactive vision-language tasks.
- Score: 47.282510186109775
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Mobile app Tasks with Iterative Feedback (MoTIF), a new dataset
where the goal is to complete a natural language query in a mobile app. Current
datasets for related tasks in interactive question answering, visual common
sense reasoning, and question-answer plausibility prediction do not support
research in resolving ambiguous natural language requests or operating in
diverse digital domains. As a result, they fail to capture complexities of real
question answering or interactive tasks. In contrast, MoTIF contains natural
language requests that are not satisfiable, the first such work to investigate
this issue for interactive vision-language tasks. MoTIF also contains follow up
questions for ambiguous queries to enable research on task uncertainty
resolution. We introduce task feasibility prediction and propose an initial
model which obtains an F1 score of 61.1. We next benchmark task automation with
our dataset and find adaptations of prior work perform poorly due to our
realistic language requests, obtaining an accuracy of only 20.2% when mapping
commands to grounded actions. We analyze performance and gain insight for
future work that may bridge the gap between current model ability and what is
needed for successful use in application.
Related papers
- Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance.
We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z) - Narrative Action Evaluation with Prompt-Guided Multimodal Interaction [60.281405999483]
Narrative action evaluation (NAE) aims to generate professional commentary that evaluates the execution of an action.
NAE is a more challenging task because it requires both narrative flexibility and evaluation rigor.
We propose a prompt-guided multimodal interaction framework to facilitate the interaction between different modalities of information.
arXiv Detail & Related papers (2024-04-22T17:55:07Z) - AAdaM at SemEval-2024 Task 1: Augmentation and Adaptation for Multilingual Semantic Textual Relatedness [16.896143197472114]
This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian languages.
We propose using machine translation for data augmentation to address the low-resource challenge of limited training data.
We achieve competitive results in the shared task: our system performs the best among all ranked teams in both subtask A (supervised learning) and subtask C (cross-lingual transfer)
arXiv Detail & Related papers (2024-04-01T21:21:15Z) - Clarify When Necessary: Resolving Ambiguity Through Interaction with LMs [58.620269228776294]
We propose a task-agnostic framework for resolving ambiguity by asking users clarifying questions.
We evaluate systems across three NLP applications: question answering, machine translation and natural language inference.
We find that intent-sim is robust, demonstrating improvements across a wide range of NLP tasks and LMs.
arXiv Detail & Related papers (2023-11-16T00:18:50Z) - ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model
for Visual Question Answering in Vietnamese [1.6340299456362617]
We introduce the ViCLEVR dataset, a pioneering collection for evaluating various visual reasoning capabilities in Vietnamese.
We conduct a comprehensive analysis of contemporary visual reasoning systems, offering valuable insights into their strengths and limitations.
We present PhoVIT, a comprehensive multimodal fusion that identifies objects in images based on questions.
arXiv Detail & Related papers (2023-10-27T10:44:50Z) - Zero-shot Clarifying Question Generation for Conversational Search [25.514678546942754]
We propose a constrained clarifying question generation system which uses both question templates and query facets to guide the effective and precise question generation.
Experiment results show that our method outperforms existing state-of-the-art zero-shot baselines by a large margin.
arXiv Detail & Related papers (2023-01-30T04:43:02Z) - Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task
Feasibility in Interactive Visual Environments [54.405920619915655]
We introduce Mobile app Tasks with Iterative Feedback (MoTIF), a dataset with natural language commands for the greatest number of interactive environments to date.
MoTIF is the first to contain natural language requests for interactive environments that are not satisfiable.
We perform initial feasibility classification experiments and only reach an F1 score of 37.3, verifying the need for richer vision-language representations.
arXiv Detail & Related papers (2021-04-17T14:48:02Z) - TransWiC at SemEval-2021 Task 2: Transformer-based Multilingual and
Cross-lingual Word-in-Context Disambiguation [0.8883733362171032]
Our approach is based on pretrained transformer models and does not use any language-specific processing and resources.
Our best model achieves 0.90 accuracy for English-English subtask which is very compatible compared to the best result of the subtask; 0.93 accuracy.
Our approach also achieves satisfactory results in other monolingual and cross-lingual language pairs as well.
arXiv Detail & Related papers (2021-04-09T23:06:05Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.