CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR
- URL: http://arxiv.org/abs/2411.04671v1
- Date: Thu, 07 Nov 2024 12:55:17 GMT
- Title: CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR
- Authors: Kadir Burak Buldu, Süleyman Özdel, Ka Hei Carrie Lau, Mengdi Wang, Daniel Saad, Sofie Schönborn, Auxane Boch, Enkelejda Kasneci, Efe Bozkir,
- Abstract summary: Large language model (LLM)powered non-player characters (NPCs) with speech-to-text (STT) and text-to-speech (TTS) models bring significant advantages over conventional or pre-scripted NPCs for facilitating more natural conversational user interfaces (CUIs) in XR.
We provide the community with an open-source, customizable, and privacy-aware Unity package, CUIfy, that facilitates speech-based NPC-user interaction with various LLMs, STT, and TTS models.
- Score: 31.49021749468963
- License:
- Abstract: Recent developments in computer graphics, machine learning, and sensor technologies enable numerous opportunities for extended reality (XR) setups for everyday life, from skills training to entertainment. With large corporations offering consumer-grade head-mounted displays (HMDs) in an affordable way, it is likely that XR will become pervasive, and HMDs will develop as personal devices like smartphones and tablets. However, having intelligent spaces and naturalistic interactions in XR is as important as technological advances so that users grow their engagement in virtual and augmented spaces. To this end, large language model (LLM)--powered non-player characters (NPCs) with speech-to-text (STT) and text-to-speech (TTS) models bring significant advantages over conventional or pre-scripted NPCs for facilitating more natural conversational user interfaces (CUIs) in XR. In this paper, we provide the community with an open-source, customizable, extensible, and privacy-aware Unity package, CUIfy, that facilitates speech-based NPC-user interaction with various LLMs, STT, and TTS models. Our package also supports multiple LLM-powered NPCs per environment and minimizes the latency between different computational models through streaming to achieve usable interactions between users and NPCs. We publish our source code in the following repository: https://gitlab.lrz.de/hctl/cuify
Related papers
- OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents [11.928422245125985]
Open Omni is an open-source, end-to-end pipeline benchmarking tool.
It integrates advanced technologies such as Speech-to-Text, Emotion Detection, Retrieval Augmented Generation, Large Language Models.
It supports local and cloud deployment, ensuring data privacy and supporting latency and accuracy benchmarking.
arXiv Detail & Related papers (2024-08-06T09:02:53Z) - ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning [74.58666091522198]
We present a framework for intuitive robot programming by non-experts.
We leverage natural language prompts and contextual information from the Robot Operating System (ROS)
Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface.
arXiv Detail & Related papers (2024-06-28T08:28:38Z) - Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality [28.27036270001756]
This work designs an autonomous workflow tailored for integrating AI agents seamlessly into extended reality (XR) applications for fine-grained training.
We present a demonstration of a multimodal fine-grained training assistant for LEGO brick assembly in a pilot XR environment.
arXiv Detail & Related papers (2024-05-16T14:20:30Z) - LEGENT: Open Platform for Embodied Agents [60.71847900126832]
We introduce LEGENT, an open, scalable platform for developing embodied agents using Large Language Models (LLMs) and Large Multimodal Models (LMMs)
LEGENT offers a rich, interactive 3D environment with communicable and actionable agents, paired with a user-friendly interface.
In experiments, an embryonic vision-language-action model trained on LEGENT-generated data surpasses GPT-4V in embodied tasks.
arXiv Detail & Related papers (2024-04-28T16:50:12Z) - ChatTracer: Large Language Model Powered Real-time Bluetooth Device Tracking System [7.21848268647674]
We present ChatTracer, an LLM-powered real-time Bluetooth device tracking system.
ChatTracer comprises an array of Bluetooth sniffing nodes, a database, and a fine-tuned LLM.
We have built a prototype of ChatTracer with four sniffing nodes.
arXiv Detail & Related papers (2024-03-28T21:04:11Z) - Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy [37.061999275101904]
We argue for using large language models in XR by embedding them in avatars or as narratives to facilitate inclusion.
We speculate that combining the information provided to LLM-powered spaces by users and the biometric data obtained might lead to novel privacy invasions.
arXiv Detail & Related papers (2024-02-06T11:19:40Z) - SpeechAgents: Human-Communication Simulation with Multi-Modal
Multi-Agent Systems [53.94772445896213]
Large Language Model (LLM)-based multi-agent systems have demonstrated promising performance in simulating human society.
We propose SpeechAgents, a multi-modal LLM based multi-agent system designed for simulating human communication.
arXiv Detail & Related papers (2024-01-08T15:01:08Z) - Agents: An Open-source Framework for Autonomous Language Agents [98.91085725608917]
We consider language agents as a promising direction towards artificial general intelligence.
We release Agents, an open-source library with the goal of opening up these advances to a wider non-specialist audience.
arXiv Detail & Related papers (2023-09-14T17:18:25Z) - GPT Models Meet Robotic Applications: Co-Speech Gesturing Chat System [8.660929270060146]
This technical paper introduces a chatting robot system that utilizes recent advancements in large-scale language models (LLMs)
The system is integrated with a co-speech gesture generation system, which selects appropriate gestures based on the conceptual meaning of speech.
arXiv Detail & Related papers (2023-05-10T10:14:16Z) - Unmasking Communication Partners: A Low-Cost AI Solution for Digitally
Removing Head-Mounted Displays in VR-Based Telepresence [62.997667081978825]
Face-to-face conversation in Virtual Reality (VR) is a challenge when participants wear head-mounted displays (HMD)
Past research has shown that high-fidelity face reconstruction with personal avatars in VR is possible under laboratory conditions with high-cost hardware.
We propose one of the first low-cost systems for this task which uses only open source, free software and affordable hardware.
arXiv Detail & Related papers (2020-11-06T23:17:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.