Enhancing Virtual Assistant Intelligence: Precise Area Targeting for
Instance-level User Intents beyond Metadata
- URL: http://arxiv.org/abs/2306.04163v1
- Date: Wed, 7 Jun 2023 05:26:38 GMT
- Title: Enhancing Virtual Assistant Intelligence: Precise Area Targeting for
Instance-level User Intents beyond Metadata
- Authors: Mengyu Chen, Zhenchang Xing, Jieshan Chen, Chunyang Chen and Qinghua
Lu
- Abstract summary: We study virtual assistants capable of processing instance-level user intents based on pixels of application screens.
We propose a novel cross-modal deep learning pipeline, which understands the input vocal or textual instance-level user intents.
We conducted a user study with 10 participants to collect a testing dataset with instance-level user intents.
- Score: 18.333599919653444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Virtual assistants have been widely used by mobile phone users in recent
years. Although their capabilities of processing user intents have been
developed rapidly, virtual assistants in most platforms are only capable of
handling pre-defined high-level tasks supported by extra manual efforts of
developers. However, instance-level user intents containing more detailed
objectives with complex practical situations, are yet rarely studied so far. In
this paper, we explore virtual assistants capable of processing instance-level
user intents based on pixels of application screens, without the requirements
of extra extensions on the application side. We propose a novel cross-modal
deep learning pipeline, which understands the input vocal or textual
instance-level user intents, predicts the targeting operational area, and
detects the absolute button area on screens without any metadata of
applications. We conducted a user study with 10 participants to collect a
testing dataset with instance-level user intents. The testing dataset is then
utilized to evaluate the performance of our model, which demonstrates that our
model is promising with the achievement of 64.43% accuracy on our testing
dataset.
Related papers
- POET: Prompt Offset Tuning for Continual Human Action Adaptation [61.63831623094721]
We aim to provide users and developers with the capability to personalize their experience by adding new action classes to their device models continually.
We formalize this as privacy-aware few-shot continual action recognition.
We propose a novel-temporal learnable prompt tuning approach, and are the first to apply such prompt tuning to Graph Neural Networks.
arXiv Detail & Related papers (2025-04-25T04:11:24Z) - Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects [84.73092715537364]
In this paper, we study a new task of navigating to diverse target objects in a large number of scene types.
We build an end-to-end embodied agent, NatVLM, by fine-tuning a Large Vision Language Model (LVLM) through imitation learning.
Our agent achieves a success rate that surpasses GPT-4o by over 20%.
arXiv Detail & Related papers (2024-10-03T17:49:28Z) - ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation [30.693616802332745]
This paper presents a novel benchmark, AssistGUI, to evaluate whether models are capable of manipulating the mouse and keyboard on the Windows platform in response to user-requested tasks.
We propose an advanced Actor-Critic framework, which incorporates a sophisticated GUI driven by an AI agent and adept at handling lengthy procedural tasks.
arXiv Detail & Related papers (2023-12-20T15:28:38Z) - A Review of Machine Learning Methods Applied to Video Analysis Systems [3.518774226658318]
The paper provides a survey of the development of machine-learning techniques for video analysis.
We provide summaries of the development of self-supervised learning, semi-supervised learning, active learning, and zero-shot learning for applications in video analysis.
arXiv Detail & Related papers (2023-12-08T20:24:03Z) - Task Relation-aware Continual User Representation Learning [26.514449669395297]
Previous efforts in user modeling mainly focus on learning a task-specific user representation that is designed for a single task.
Recent studies introduce the concept of universal user representation, which is a more generalized representation of a user relevant to a variety of tasks.
Despite their effectiveness, existing approaches for learning universal user representations are impractical in real-world applications.
We propose a novel continual user representation learning method, called TERACON, whose learning capability is not limited as the number of learned tasks increases.
arXiv Detail & Related papers (2023-06-01T08:10:03Z) - Versatile User Identification in Extended Reality using Pretrained Similarity-Learning [16.356961801884562]
We develop a similarity-learning model and pretrained it on the "Who Is Alyx?" dataset.
In comparison with a traditional classification-learning baseline, our model shows superior performance.
Our approach paves the way for easy integration of pretrained motion-based identification models in production XR systems.
arXiv Detail & Related papers (2023-02-15T08:26:24Z) - Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation
with Large Language Models [116.25562358482962]
State-of-the-art neural language models can be used to solve ad-hoc language tasks without the need for supervised training.
PromptIDE allows users to experiment with prompt variations, visualize prompt performance, and iteratively optimize prompts.
arXiv Detail & Related papers (2022-08-16T17:17:53Z) - ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement
Learning [91.58711082348293]
Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem.
This approach tends to require a large amount of human-in-the-loop training data, especially when feedback is sparse.
We propose a hierarchical solution that learns efficiently from sparse user feedback.
arXiv Detail & Related papers (2022-02-05T02:01:19Z) - Embodied Visual Active Learning for Semantic Segmentation [33.02424587900808]
We study the task of embodied visual active learning, where an agent is set to explore a 3d environment with the goal to acquire visual scene understanding.
We develop a battery of agents - both learnt and pre-specified - and with different levels of knowledge of the environment.
We extensively evaluate the proposed models using the Matterport3D simulator and show that a fully learnt method outperforms comparable pre-specified counterparts.
arXiv Detail & Related papers (2020-12-17T11:02:34Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.