Related papers: Voice CMS: updating the knowledge base of a digital assistant through conversation

Voice CMS: updating the knowledge base of a digital assistant through conversation

URL: http://arxiv.org/abs/2505.22303v1
Date: Wed, 28 May 2025 12:40:37 GMT
Title: Voice CMS: updating the knowledge base of a digital assistant through conversation
Authors: Grzegorz Wolny, Michał Szczerbak,
Abstract summary: We propose a solution based on a multi-agent LLM architecture and a voice user interface (VUI) designed to update the knowledge base of a digital assistant.<n>Its usability is evaluated in comparison to a more traditional graphical content management system (CMS)<n>The findings demonstrate that, while the overall usability of the VUI is rated lower than the graphical interface, it is already preferred by users for less complex tasks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this study, we propose a solution based on a multi-agent LLM architecture and a voice user interface (VUI) designed to update the knowledge base of a digital assistant. Its usability is evaluated in comparison to a more traditional graphical content management system (CMS), with a focus on understanding the relationship between user preferences and the complexity of the information being provided. The findings demonstrate that, while the overall usability of the VUI is rated lower than the graphical interface, it is already preferred by users for less complex tasks. Furthermore, the quality of content entered through the VUI is comparable to that achieved with the graphical interface, even for highly complex tasks. Obtained qualitative results suggest that a hybrid interface combining the strengths of both approaches could address the key challenges identified during the experiment, such as reducing cognitive load through graphical feedback while maintaining the intuitive nature of voice-based interactions. This work highlights the potential of conversational interfaces as a viable and effective method for knowledge management in specific business contexts.

Related papers

Evaluating Node-tree Interfaces for AI Explainability [0.5437050212139087]
This study evaluates user experiences with two distinct AI interfaces - node-tree interfaces and chatbots.<n>Our node-tree interface visually structures AI-generated responses into hierarchically organized, interactive nodes.<n>Our findings suggest that AI interfaces capable of switching between structured visualizations and conversational formats can significantly enhance transparency and user confidence in AI-powered systems.
arXiv Detail & Related papers (2025-10-07T20:48:08Z)
Generative Interfaces for Language Models [70.25765232527762]
We propose a paradigm in which large language models (LLMs) respond to user queries by proactively generating user interfaces (UIs)<n>Our framework leverages structured interface-specific representations and iterative refinements to translate user queries into task-specific UIs.<n>Results show that generative interfaces consistently outperform conversational ones, with up to a 72% improvement in human preference.
arXiv Detail & Related papers (2025-08-26T17:43:20Z)
Learning, Reasoning, Refinement: A Framework for Kahneman's Dual-System Intelligence in GUI Agents [15.303188467166752]
We present CogniGUI, a cognitive framework developed to overcome limitations by enabling adaptive learning for GUI automation resembling human-like behavior.<n>To assess the generalization and adaptability of agent systems, we introduce ScreenSeek, a comprehensive benchmark that includes multi application navigation, dynamic state transitions, and cross interface coherence.<n> Experimental results demonstrate that CogniGUI surpasses state-of-the-art methods in both the current GUI grounding benchmarks and our newly proposed benchmark.
arXiv Detail & Related papers (2025-06-22T06:30:52Z)
A Survey on (M)LLM-Based GUI Agents [62.57899977018417]
Graphical User Interface (GUI) Agents have emerged as a transformative paradigm in human-computer interaction.<n>Recent advances in large language models and multimodal learning have revolutionized GUI automation across desktop, mobile, and web platforms.<n>This survey identifies key technical challenges, including accurate element localization, effective knowledge retrieval, long-horizon planning, and safety-aware execution control.
arXiv Detail & Related papers (2025-03-27T17:58:31Z)
Think Twice, Click Once: Enhancing GUI Grounding via Fast and Slow Systems [57.30711059396246]
Current Graphical User Interface (GUI) grounding systems locate interface elements based on natural language instructions.<n>Inspired by human dual-system cognition, we present Focus, a novel GUI grounding framework that combines fast prediction with systematic analysis.
arXiv Detail & Related papers (2025-03-09T06:14:17Z)
InterChat: Enhancing Generative Visual Analytics using Multimodal Interactions [22.007942964950217]
We develop InterChat, a generative visual analytics system that combines direct manipulation of visual elements with natural language inputs.<n>This integration enables precise intent communication and supports progressive, visually driven exploratory data analyses.
arXiv Detail & Related papers (2025-03-06T05:35:19Z)
GUI Agents: A Survey [129.94551809688377]
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction.<n>Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods.
arXiv Detail & Related papers (2024-12-18T04:48:28Z)
Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining [67.87810796668981]
Information-Sensitive Cropping (ISC) and Self-Refining Dual Learning (SRDL)<n>Iris achieves state-of-the-art performance across multiple benchmarks with only 850K GUI annotations.<n>These improvements translate to significant gains in both web and OS agent downstream tasks.
arXiv Detail & Related papers (2024-12-13T18:40:10Z)
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction [69.57190742976091]
Aguvis is a vision-based framework for autonomous GUI agents.<n>It standardizes cross-platform interactions and incorporates structured reasoning via inner monologue.<n>It achieves state-of-the-art performance across offline and real-world online benchmarks.
arXiv Detail & Related papers (2024-12-05T18:58:26Z)
AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data [14.328402787379538]
We introduce AGENTiGraph (Adaptive Generative ENgine for Task-based Interaction and Graphical Representation), a platform for knowledge management through natural language interaction. AGENTiGraph employs a multi-agent architecture to dynamically interpret user intents, manage tasks, and integrate new knowledge. Experimental results on a dataset of 3,500 test cases show AGENTiGraph significantly outperforms state-of-the-art zero-shot baselines.
arXiv Detail & Related papers (2024-10-15T12:05:58Z)
From Pixels to Words: Leveraging Explainability in Face Recognition through Interactive Natural Language Processing [2.7568948557193287]
Face Recognition (FR) has advanced significantly with the development of deep learning, achieving high accuracy in several applications.<n>The lack of interpretability of these systems raises concerns about their accountability, fairness, and reliability.<n>We propose an interactive framework to enhance the explainability of FR models by combining model-agnostic Explainable Artificial Intelligence (XAI) and Natural Language Processing (NLP) techniques.
arXiv Detail & Related papers (2024-09-24T13:40:39Z)
Knowledge Graphs and Pre-trained Language Models enhanced Representation Learning for Conversational Recommender Systems [58.561904356651276]
We introduce the Knowledge-Enhanced Entity Representation Learning (KERL) framework to improve the semantic understanding of entities for Conversational recommender systems. KERL uses a knowledge graph and a pre-trained language model to improve the semantic understanding of entities. KERL achieves state-of-the-art results in both recommendation and response generation tasks.
arXiv Detail & Related papers (2023-12-18T06:41:23Z)
Using Textual Interface to Align External Knowledge for End-to-End Task-Oriented Dialogue Systems [53.38517204698343]
We propose a novel paradigm that uses a textual interface to align external knowledge and eliminate redundant processes. We demonstrate our paradigm in practice through MultiWOZ-Remake, including an interactive textual interface built for the MultiWOZ database.
arXiv Detail & Related papers (2023-05-23T05:48:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.