Let's Give a Voice to Conversational Agents in Virtual Reality
- URL: http://arxiv.org/abs/2308.02665v1
- Date: Fri, 4 Aug 2023 18:51:38 GMT
- Title: Let's Give a Voice to Conversational Agents in Virtual Reality
- Authors: Michele Yin, Gabriel Roccabruna, Abhinav Azad, Giuseppe Riccardi
- Abstract summary: We present an open-source architecture with the goal of simplifying the development of conversational agents in virtual environments.
We present two conversational prototypes operating in the digital health domain developed in Unity for both non-immersive displays and VR headsets.
- Score: 2.7470819871568506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The dialogue experience with conversational agents can be greatly enhanced
with multimodal and immersive interactions in virtual reality. In this work, we
present an open-source architecture with the goal of simplifying the
development of conversational agents operating in virtual environments. The
architecture offers the possibility of plugging in conversational agents of
different domains and adding custom or cloud-based Speech-To-Text and
Text-To-Speech models to make the interaction voice-based. Using this
architecture, we present two conversational prototypes operating in the digital
health domain developed in Unity for both non-immersive displays and VR
headsets.
Related papers
- Moshi: a speech-text foundation model for real-time dialogue [78.88479749811376]
Current systems for spoken dialogue rely on pipelines independent voice activity detection and text-to-speech.
We show how Moshi Moshi can provide streaming speech recognition and text-to-speech.
Our resulting model is first real-time full spoken large language model modality.
arXiv Detail & Related papers (2024-09-17T17:55:39Z) - RITA: A Real-time Interactive Talking Avatars Framework [6.060251768347276]
RITA presents a high-quality real-time interactive framework built upon generative models.
Our framework enables the transformation of user-uploaded photos into digital avatars that can engage in real-time dialogue interactions.
arXiv Detail & Related papers (2024-06-18T22:53:15Z) - Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [55.043492250775294]
We introduce a novel Face-to-Face spoken dialogue model.
It processes audio-visual speech from user input and generates audio-visual speech as the response.
We also introduce MultiDialog, the first large-scale multimodal spoken dialogue corpus.
arXiv Detail & Related papers (2024-06-12T04:48:36Z) - From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands.
We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures.
Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z) - Voice2Action: Language Models as Agent for Efficient Real-Time
Interaction in Virtual Reality [1.160324357508053]
Large Language Models (LLMs) are trained to follow natural language instructions with only a handful of examples.
We propose Voice2Action, a framework that hierarchically analyzes customized voice signals and textual commands through action and entity extraction.
Experiment results in an urban engineering VR environment with synthetic instruction data show that Voice2Action can perform more efficiently and accurately than approaches without optimizations.
arXiv Detail & Related papers (2023-09-29T19:06:52Z) - SAPIEN: Affective Virtual Agents Powered by Large Language Models [2.423280064224919]
We introduce SAPIEN, a platform for high-fidelity virtual agents driven by large language models.
The platform allows users to customize their virtual agent's personality, background, and conversation premise.
After the virtual meeting, the user can choose to get the conversation analyzed and receive actionable feedback on their communication skills.
arXiv Detail & Related papers (2023-08-06T05:13:16Z) - Interactive Conversational Head Generation [68.76774230274076]
We introduce a new conversation head generation benchmark for synthesizing behaviors of a single interlocutor in a face-to-face conversation.
The capability to automatically synthesize interlocutors which can participate in long and multi-turn conversations is vital and offer benefits for various applications.
arXiv Detail & Related papers (2023-07-05T08:06:26Z) - Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion [89.01668641930206]
We present a framework for modeling interactional communication in dyadic conversations.
We autoregressively output multiple possibilities of corresponding listener motion.
Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions.
arXiv Detail & Related papers (2022-04-18T17:58:04Z) - VIRT: Improving Representation-based Models for Text Matching through
Virtual Interaction [50.986371459817256]
We propose a novel textitVirtual InteRacTion mechanism, termed as VIRT, to enable full and deep interaction modeling in representation-based models.
VIRT asks representation-based encoders to conduct virtual interactions to mimic the behaviors as interaction-based models do.
arXiv Detail & Related papers (2021-12-08T09:49:28Z) - Building Goal-Oriented Dialogue Systems with Situated Visual Context [12.014793558784955]
With the surge of virtual assistants with screen, the next generation of agents are required to understand screen context.
We propose a novel multimodal conversational framework, where the dialogue agent's next action and their arguments are derived jointly conditioned both on the conversational and the visual context.
Our model can recognize visual features such as color and shape as well as the metadata based features such as price or star rating associated with a visual entity.
arXiv Detail & Related papers (2021-11-22T23:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.