Related papers: RITA: A Real-time Interactive Talking Avatars Framework

RITA: A Real-time Interactive Talking Avatars Framework

URL: http://arxiv.org/abs/2406.13093v1
Date: Tue, 18 Jun 2024 22:53:15 GMT
Title: RITA: A Real-time Interactive Talking Avatars Framework
Authors: Wuxinlin Cheng, Cheng Wan, Yupeng Cao, Sihan Chen,
Abstract summary: RITA presents a high-quality real-time interactive framework built upon generative models. Our framework enables the transformation of user-uploaded photos into digital avatars that can engage in real-time dialogue interactions.
Score: 6.060251768347276
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: RITA presents a high-quality real-time interactive framework built upon generative models, designed with practical applications in mind. Our framework enables the transformation of user-uploaded photos into digital avatars that can engage in real-time dialogue interactions. By leveraging the latest advancements in generative modeling, we have developed a versatile platform that not only enhances the user experience through dynamic conversational avatars but also opens new avenues for applications in virtual reality, online education, and interactive gaming. This work showcases the potential of integrating computer vision and natural language processing technologies to create immersive and interactive digital personas, pushing the boundaries of how we interact with digital content.

Related papers

Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset [113.25650486482762]
We introduce the Seamless Interaction dataset, a large-scale collection of over 4,000 hours of face-to-face interaction footage.<n>This dataset enables the development of AI technologies that understand dyadic embodied dynamics.<n>We develop a suite of models that utilize the dataset to generate dyadic motion gestures and facial expressions aligned with human speech.
arXiv Detail & Related papers (2025-06-27T18:09:49Z)
SmartAvatar: Text- and Image-Guided Human Avatar Generation with VLM AI Agents [91.26239311240873]
SmartAvatar is a vision-language-agent-driven framework for generating fully rigged, animation-ready 3D human avatars.<n>A key innovation is an autonomous verification loop, where the agent renders draft avatars.<n>The generated avatars are fully rigged and support pose manipulation with consistent identity and appearance.
arXiv Detail & Related papers (2025-06-05T03:49:01Z)
DRAWER: Digital Reconstruction and Articulation With Environment Realism [42.13130021795826]
We present DRAWER, a novel framework that converts a video of a static indoor scene into a photorealistic and interactive digital environment. We demonstrate the potential of DRAWER by using it to automatically create an interactive game in Unreal Engine and to enable real-to-sim-to-real transfer for robotics applications.
arXiv Detail & Related papers (2025-04-21T17:59:49Z)
Social Conjuring: Multi-User Runtime Collaboration with AI in Building Virtual 3D Worlds [3.5152339192019113]
Social Conjurer is a framework for AI-augmented dynamic 3D scene co-creation. This article presents a set of implications for designing human-centered interfaces that incorporate AI models into 3D content generation.
arXiv Detail & Related papers (2024-09-30T23:02:51Z)
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands. We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures. Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z)
Digital Life Project: Autonomous 3D Characters with Social Intelligence [86.2845109451914]
Digital Life Project is a framework utilizing language as the universal medium to build autonomous 3D characters. Our framework comprises two primary components: SocioMind and MoMat-MoGen.
arXiv Detail & Related papers (2023-12-07T18:58:59Z)
AgentAvatar: Disentangling Planning, Driving and Rendering for Photorealistic Avatar Agents [16.544688997764293]
Our framework harnesses LLMs to produce a series of detailed text descriptions of the avatar agents' facial motions. These descriptions are processed by our task-agnostic driving engine into continuous motion embeddings. Our framework adapts to a variety of non-verbal avatar interactions, both monadic and dyadic.
arXiv Detail & Related papers (2023-11-29T09:13:00Z)
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention [55.2825684201129]
DeepSpeed-VisualChat is designed to optimize Large Language Models (LLMs) by incorporating multi-modal capabilities. Our framework is notable for (1) its open-source support for multi-round and multi-image dialogues, (2) introducing an innovative multi-modal causal attention mechanism, and (3) utilizing data blending techniques on existing datasets to assure seamless interactions.
arXiv Detail & Related papers (2023-09-25T17:53:29Z)
SAPIEN: Affective Virtual Agents Powered by Large Language Models [2.423280064224919]
We introduce SAPIEN, a platform for high-fidelity virtual agents driven by large language models. The platform allows users to customize their virtual agent's personality, background, and conversation premise. After the virtual meeting, the user can choose to get the conversation analyzed and receive actionable feedback on their communication skills.
arXiv Detail & Related papers (2023-08-06T05:13:16Z)
Let's Give a Voice to Conversational Agents in Virtual Reality [2.7470819871568506]
We present an open-source architecture with the goal of simplifying the development of conversational agents in virtual environments. We present two conversational prototypes operating in the digital health domain developed in Unity for both non-immersive displays and VR headsets.
arXiv Detail & Related papers (2023-08-04T18:51:38Z)
ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human [76.62897301298699]
ChatPLUG is a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format. We show that modelname outperforms state-of-the-art Chinese dialogue systems on both automatic and human evaluation. We deploy modelname to real-world applications such as Smart Speaker and Instant Message applications with fast inference.
arXiv Detail & Related papers (2023-04-16T18:16:35Z)
FaceChat: An Emotion-Aware Face-to-face Dialogue Framework [58.67608580694849]
FaceChat is a web-based dialogue framework that enables emotionally-sensitive and face-to-face conversations. System has a wide range of potential applications, including counseling, emotional support, and personalized customer service.
arXiv Detail & Related papers (2023-03-08T20:45:37Z)
RealityTalk: Real-Time Speech-Driven Augmented Presentation for AR Live Storytelling [7.330145218077073]
We present RealityTalk, a system that augments real-time live presentations with speech-driven interactive virtual elements. Based on our analysis of 177 existing video-edited augmented presentations, we propose a novel set of interaction techniques. We evaluate our tool from a presenter's perspective to demonstrate the effectiveness of our system.
arXiv Detail & Related papers (2022-08-12T16:12:00Z)
VIRT: Improving Representation-based Models for Text Matching through Virtual Interaction [50.986371459817256]
We propose a novel textitVirtual InteRacTion mechanism, termed as VIRT, to enable full and deep interaction modeling in representation-based models. VIRT asks representation-based encoders to conduct virtual interactions to mimic the behaviors as interaction-based models do.
arXiv Detail & Related papers (2021-12-08T09:49:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.