Related papers: Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs

Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs

URL: http://arxiv.org/abs/2402.07938v2
Date: Tue, 16 Apr 2024 07:39:05 GMT
Title: Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs
Authors: Syed Mekael Wasti, Ken Q. Pu, Ali Neshati,
Abstract summary: We present a framework that can serve as an intermediary between a user and their user interface (UI) We employ a system that stands upon textual semantic mappings of UI components, in the form of annotations. Our engine can classify the most appropriate application, extract relevant parameters, and subsequently execute precise predictions of the user's expected actions.
Score: 5.06113628525842
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The evolution of Large Language Models (LLMs) has showcased remarkable capacities for logical reasoning and natural language comprehension. These capabilities can be leveraged in solutions that semantically and textually model complex problems. In this paper, we present our efforts toward constructing a framework that can serve as an intermediary between a user and their user interface (UI), enabling dynamic and real-time interactions. We employ a system that stands upon textual semantic mappings of UI components, in the form of annotations. These mappings are stored, parsed, and scaled in a custom data structure, supplementary to an agent-based prompting backend engine. Employing textual semantic mappings allows each component to not only explain its role to the engine but also provide expectations. By comprehending the needs of both the user and the components, our LLM engine can classify the most appropriate application, extract relevant parameters, and subsequently execute precise predictions of the user's expected actions. Such an integration evolves static user interfaces into highly dynamic and adaptable solutions, introducing a new frontier of intelligent and responsive user experiences.

Related papers

Adaptive Orchestration of Modular Generative Information Access Systems [59.102816309859584]
We argue that the architecture of future modular generative information access systems will not just assemble powerful components, but enable a self-organizing system. This perspective urges the IR community to rethink modular system designs for developing adaptive, self-optimizing, and future-ready architectures.
arXiv Detail & Related papers (2025-04-24T11:35:43Z)
Unveiling User Preferences: A Knowledge Graph and LLM-Driven Approach for Conversational Recommendation [55.5687800992432]
We propose a plug-and-play framework that synergizes Large Language Models (LLMs) and Knowledge Graphs (KGs) to unveil user preferences. This enables the LLM to transform KG entities into concise natural language descriptions, allowing them to comprehend domain-specific knowledge.
arXiv Detail & Related papers (2024-11-16T11:47:21Z)
Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data [84.01401439030265]
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs) We present a simple yet effective automatic process for creating speech-text pair data. Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data.
arXiv Detail & Related papers (2024-09-30T07:01:21Z)
CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation [61.68049335444254]
Multimodal large language models (MLLMs) have shown remarkable potential as human-like autonomous language agents to interact with real-world environments. We propose a Comprehensive Cognitive LLM Agent, CoCo-Agent, with two novel approaches, comprehensive environment perception (CEP) and conditional action prediction (CAP) With our technical design, our agent achieves new state-of-the-art performance on AITW and META-GUI benchmarks, showing promising abilities in realistic scenarios.
arXiv Detail & Related papers (2024-02-19T08:29:03Z)
PROMISE: A Framework for Developing Complex Conversational Interactions (Technical Report) [33.7054351451505]
We present PROMISE, a framework that facilitates the development of complex language-based interactions with information systems. We show the benefits of PROMISE in the context of application scenarios within health information systems and demonstrate its ability to handle complex interactions.
arXiv Detail & Related papers (2023-12-06T18:59:11Z)
Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses [35.74453152447319]
ExploreLLM allows users to structure thoughts, help explore different options, navigate through the choices and recommendations, and to more easily steer models to generate more personalized responses. We conduct a user study and show that users find it helpful to use ExploreLLM for exploratory or planning tasks, because it provides a useful schema-like structure to the task, and guides users in planning. The study also suggests that users can more easily personalize responses with high-level preferences with ExploreLLM.
arXiv Detail & Related papers (2023-12-01T18:31:28Z)
Interpreting User Requests in the Context of Natural Language Standing Instructions [89.12540932734476]
We develop NLSI, a language-to-program dataset consisting of over 2.4K dialogues spanning 17 domains. A key challenge in NLSI is to identify which subset of the standing instructions is applicable to a given dialogue.
arXiv Detail & Related papers (2023-11-16T11:19:26Z)
Dialogue-based generation of self-driving simulation scenarios using Large Language Models [14.86435467709869]
Simulation is an invaluable tool for developing and evaluating controllers for self-driving cars. Current simulation frameworks are driven by highly-specialist domain specific languages. There is often a gap between a concise English utterance and the executable code that captures the user's intent.
arXiv Detail & Related papers (2023-10-26T13:07:01Z)
AmadeusGPT: a natural language interface for interactive animal behavioral analysis [65.55906175884748]
We introduce AmadeusGPT: a natural language interface that turns natural language descriptions of behaviors into machine-executable code. We show we can produce state-of-the-art performance on the MABE 2022 behavior challenge tasks. AmadeusGPT presents a novel way to merge deep biological knowledge, large-language models, and core computer vision modules into a more naturally intelligent system.
arXiv Detail & Related papers (2023-07-10T19:15:17Z)
Query Understanding in the Age of Large Language Models [6.630482733703617]
We describe a generic framework for interactive query-rewriting using large-language models (LLM) A key aspect of our framework is the ability of the rewriter to fully specify the machine intent by the search engine in natural language. We detail the concept, backed by initial experiments, along with open questions for this interactive query understanding framework.
arXiv Detail & Related papers (2023-06-28T08:24:14Z)
PADL: Language-Directed Physics-Based Character Control [66.517142635815]
We present PADL, which allows users to issue natural language commands for specifying high-level tasks and low-level skills that a character should perform. We show that our framework can be applied to effectively direct a simulated humanoid character to perform a diverse array of complex motor skills.
arXiv Detail & Related papers (2023-01-31T18:59:22Z)
Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning [69.1137074774244]
Leveraging language interactions effectively requires addressing limitations in the two most common approaches to language grounding. We introduce the idea of neural abstructions: a set of constraints on the inference procedure of a label-conditioned generative model. We show that with this method a user population is able to build a semantic modification for an open-ended house task in Minecraft.
arXiv Detail & Related papers (2021-07-20T07:01:15Z)
Intent Features for Rich Natural Language Understanding [7.522454850008495]
Complex natural language understanding modules in dialog systems have a richer understanding of user utterances. These models are often created from scratch, for specific clients and use cases, and require the annotation of large datasets. We introduce the idea of intent features: domain and topic agnostic properties of intents that can be learned from the syntactic cues only.
arXiv Detail & Related papers (2021-04-18T03:57:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.