RITA: A Real-time Interactive Talking Avatars Framework
- URL: http://arxiv.org/abs/2406.13093v1
- Date: Tue, 18 Jun 2024 22:53:15 GMT
- Title: RITA: A Real-time Interactive Talking Avatars Framework
- Authors: Wuxinlin Cheng, Cheng Wan, Yupeng Cao, Sihan Chen,
- Abstract summary: RITA presents a high-quality real-time interactive framework built upon generative models.
Our framework enables the transformation of user-uploaded photos into digital avatars that can engage in real-time dialogue interactions.
- Score: 6.060251768347276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RITA presents a high-quality real-time interactive framework built upon generative models, designed with practical applications in mind. Our framework enables the transformation of user-uploaded photos into digital avatars that can engage in real-time dialogue interactions. By leveraging the latest advancements in generative modeling, we have developed a versatile platform that not only enhances the user experience through dynamic conversational avatars but also opens new avenues for applications in virtual reality, online education, and interactive gaming. This work showcases the potential of integrating computer vision and natural language processing technologies to create immersive and interactive digital personas, pushing the boundaries of how we interact with digital content.
Related papers
- Social Conjuring: Multi-User Runtime Collaboration with AI in Building Virtual 3D Worlds [3.5152339192019113]
Social Conjurer is a framework for AI-augmented dynamic 3D scene co-creation.
This article presents a set of implications for designing human-centered interfaces that incorporate AI models into 3D content generation.
arXiv Detail & Related papers (2024-09-30T23:02:51Z) - From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands.
We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures.
Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z) - Digital Life Project: Autonomous 3D Characters with Social Intelligence [86.2845109451914]
Digital Life Project is a framework utilizing language as the universal medium to build autonomous 3D characters.
Our framework comprises two primary components: SocioMind and MoMat-MoGen.
arXiv Detail & Related papers (2023-12-07T18:58:59Z) - AgentAvatar: Disentangling Planning, Driving and Rendering for
Photorealistic Avatar Agents [16.544688997764293]
Our framework harnesses LLMs to produce a series of detailed text descriptions of the avatar agents' facial motions.
These descriptions are processed by our task-agnostic driving engine into continuous motion embeddings.
Our framework adapts to a variety of non-verbal avatar interactions, both monadic and dyadic.
arXiv Detail & Related papers (2023-11-29T09:13:00Z) - DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via
Multi-Modal Causal Attention [55.2825684201129]
DeepSpeed-VisualChat is designed to optimize Large Language Models (LLMs) by incorporating multi-modal capabilities.
Our framework is notable for (1) its open-source support for multi-round and multi-image dialogues, (2) introducing an innovative multi-modal causal attention mechanism, and (3) utilizing data blending techniques on existing datasets to assure seamless interactions.
arXiv Detail & Related papers (2023-09-25T17:53:29Z) - SAPIEN: Affective Virtual Agents Powered by Large Language Models [2.423280064224919]
We introduce SAPIEN, a platform for high-fidelity virtual agents driven by large language models.
The platform allows users to customize their virtual agent's personality, background, and conversation premise.
After the virtual meeting, the user can choose to get the conversation analyzed and receive actionable feedback on their communication skills.
arXiv Detail & Related papers (2023-08-06T05:13:16Z) - Let's Give a Voice to Conversational Agents in Virtual Reality [2.7470819871568506]
We present an open-source architecture with the goal of simplifying the development of conversational agents in virtual environments.
We present two conversational prototypes operating in the digital health domain developed in Unity for both non-immersive displays and VR headsets.
arXiv Detail & Related papers (2023-08-04T18:51:38Z) - ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented
Instruction Tuning for Digital Human [76.62897301298699]
ChatPLUG is a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format.
We show that modelname outperforms state-of-the-art Chinese dialogue systems on both automatic and human evaluation.
We deploy modelname to real-world applications such as Smart Speaker and Instant Message applications with fast inference.
arXiv Detail & Related papers (2023-04-16T18:16:35Z) - FaceChat: An Emotion-Aware Face-to-face Dialogue Framework [58.67608580694849]
FaceChat is a web-based dialogue framework that enables emotionally-sensitive and face-to-face conversations.
System has a wide range of potential applications, including counseling, emotional support, and personalized customer service.
arXiv Detail & Related papers (2023-03-08T20:45:37Z) - RealityTalk: Real-Time Speech-Driven Augmented Presentation for AR Live
Storytelling [7.330145218077073]
We present RealityTalk, a system that augments real-time live presentations with speech-driven interactive virtual elements.
Based on our analysis of 177 existing video-edited augmented presentations, we propose a novel set of interaction techniques.
We evaluate our tool from a presenter's perspective to demonstrate the effectiveness of our system.
arXiv Detail & Related papers (2022-08-12T16:12:00Z) - VIRT: Improving Representation-based Models for Text Matching through
Virtual Interaction [50.986371459817256]
We propose a novel textitVirtual InteRacTion mechanism, termed as VIRT, to enable full and deep interaction modeling in representation-based models.
VIRT asks representation-based encoders to conduct virtual interactions to mimic the behaviors as interaction-based models do.
arXiv Detail & Related papers (2021-12-08T09:49:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.