Related papers: Real-time Animation Generation and Control on Rigged Models via Large Language Models

Real-time Animation Generation and Control on Rigged Models via Large Language Models

URL: http://arxiv.org/abs/2310.17838v2
Date: Thu, 15 Feb 2024 18:56:41 GMT
Title: Real-time Animation Generation and Control on Rigged Models via Large Language Models
Authors: Han Huang, Fernanda De La Torre, Cathy Mengying Fang, Andrzej Banburski-Fahey, Judith Amores, Jaron Lanier
Abstract summary: We introduce a novel method for real-time animation control and generation on rigged models using natural language input. We embed a large language model (LLM) in Unity to output structured texts that can be parsed into diverse and realistic animations.
Score: 50.034712575541434
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a novel method for real-time animation control and generation on rigged models using natural language input. First, we embed a large language model (LLM) in Unity to output structured texts that can be parsed into diverse and realistic animations. Second, we illustrate LLM's potential to enable flexible state transition between existing animations. We showcase the robustness of our approach through qualitative results on various rigged models and motions.

Related papers

X-Dyna: Expressive Dynamic Human Image Animation [49.896933584815926]
X-Dyna is a zero-shot, diffusion-based pipeline for animating a single human image. It generates realistic, context-aware dynamics for both the subject and the surrounding environment.
arXiv Detail & Related papers (2025-01-17T08:10:53Z)
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models [59.10171699717122]
MoTrans is a customized motion transfer method enabling video generation of similar motion in new context. multimodal representations from recaptioned prompt and video frames promote the modeling of appearance. Our method effectively learns specific motion pattern from singular or multiple reference videos.
arXiv Detail & Related papers (2024-12-02T10:07:59Z)
MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion [8.94802080815133]
MoRAG is a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation. We create diverse samples through the spatial composition of the retrieved motions. Our framework can serve as a plug-and-play module, improving the performance of motion diffusion models.
arXiv Detail & Related papers (2024-09-18T17:03:30Z)
Towards Multi-Task Multi-Modal Models: A Video Generative Perspective [5.495245220300184]
This thesis chronicles our endeavor to build multi-task models for generating videos and other modalities under diverse conditions. We unveil a novel approach to mapping bidirectionally between visual observation and interpretable lexical terms. Our scalable visual token representation proves beneficial across generation, compression, and understanding tasks.
arXiv Detail & Related papers (2024-05-26T23:56:45Z)
LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation [62.232361821779335]
We introduce a tuning-free attention control framework, encapsulated by the progressive process of prompt-Aware editing, StablE animation geneRation, abbreviated as LASER. We manipulate the model's spatial features and self-attention mechanisms to maintain animation integrity. Our meticulous control over spatial features and self-attention ensures structural consistency in the images.
arXiv Detail & Related papers (2024-04-21T07:13:56Z)
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception [63.03288425612792]
We propose bfAnyRef, a general MLLM model that can generate pixel-wise object perceptions and natural language descriptions from multi-modality references. Our model achieves state-of-the-art results across multiple benchmarks, including diverse modality referring segmentation and region-level referring expression generation.
arXiv Detail & Related papers (2024-03-05T13:45:46Z)
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning [92.33690050667475]
AnimateDiff is a framework for animating personalized T2I models without requiring model-specific tuning. We propose MotionLoRA, a lightweight fine-tuning technique for AnimateDiff that enables a pre-trained motion module to adapt to new motion patterns. Results show that our approaches help these models generate temporally smooth animation clips while preserving the visual quality and motion diversity.
arXiv Detail & Related papers (2023-07-10T17:34:16Z)
Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation [47.272177594990104]
We introduce Make-An-Animation, a text-conditioned human motion generation model. It learns more diverse poses and prompts from large-scale image-text datasets. It reaches state-of-the-art performance on text-to-motion generation.
arXiv Detail & Related papers (2023-05-16T17:58:43Z)
FLAME: Free-form Language-based Motion Synthesis & Editing [17.70085940884357]
We propose a diffusion-based motion synthesis and editing model named FLAME. FLAME can generate high-fidelity motions well aligned with the given text. It can edit the parts of the motion, both frame-wise and joint-wise, without any fine-tuning.
arXiv Detail & Related papers (2022-09-01T10:34:57Z)
Language-Guided Face Animation by Recurrent StyleGAN-based Generator [87.56260982475564]
We study a novel task, language-guided face animation, that aims to animate a static face image with the help of languages. We propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames.
arXiv Detail & Related papers (2022-08-11T02:57:30Z)
TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions. We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data. We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.