Real-time Animation Generation and Control on Rigged Models via Large
  Language Models
        - URL: http://arxiv.org/abs/2310.17838v2
- Date: Thu, 15 Feb 2024 18:56:41 GMT
- Title: Real-time Animation Generation and Control on Rigged Models via Large
  Language Models
- Authors: Han Huang, Fernanda De La Torre, Cathy Mengying Fang, Andrzej
  Banburski-Fahey, Judith Amores, Jaron Lanier
- Abstract summary: We introduce a novel method for real-time animation control and generation on rigged models using natural language input.
We embed a large language model (LLM) in Unity to output structured texts that can be parsed into diverse and realistic animations.
- Score: 50.034712575541434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   We introduce a novel method for real-time animation control and generation on
rigged models using natural language input. First, we embed a large language
model (LLM) in Unity to output structured texts that can be parsed into diverse
and realistic animations. Second, we illustrate LLM's potential to enable
flexible state transition between existing animations. We showcase the
robustness of our approach through qualitative results on various rigged models
and motions.
 
      
        Related papers
        - X-Dyna: Expressive Dynamic Human Image Animation [49.896933584815926]
 X-Dyna is a zero-shot, diffusion-based pipeline for animating a single human image.
It generates realistic, context-aware dynamics for both the subject and the surrounding environment.
 arXiv  Detail & Related papers  (2025-01-17T08:10:53Z)
- MoTrans: Customized Motion Transfer with Text-driven Video Diffusion   Models [59.10171699717122]
 MoTrans is a customized motion transfer method enabling video generation of similar motion in new context.
 multimodal representations from recaptioned prompt and video frames promote the modeling of appearance.
Our method effectively learns specific motion pattern from singular or multiple reference videos.
 arXiv  Detail & Related papers  (2024-12-02T10:07:59Z)
- MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion [8.94802080815133]
 MoRAG is a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation.
We create diverse samples through the spatial composition of the retrieved motions.
Our framework can serve as a plug-and-play module, improving the performance of motion diffusion models.
 arXiv  Detail & Related papers  (2024-09-18T17:03:30Z)
- Towards Multi-Task Multi-Modal Models: A Video Generative Perspective [5.495245220300184]
 This thesis chronicles our endeavor to build multi-task models for generating videos and other modalities under diverse conditions.
We unveil a novel approach to mapping bidirectionally between visual observation and interpretable lexical terms.
Our scalable visual token representation proves beneficial across generation, compression, and understanding tasks.
 arXiv  Detail & Related papers  (2024-05-26T23:56:45Z)
- LASER: Tuning-Free LLM-Driven Attention Control for Efficient   Text-conditioned Image-to-Animation [62.232361821779335]
 We introduce a tuning-free attention control framework, encapsulated by the progressive process of prompt-Aware editing, StablE animation geneRation, abbreviated as LASER.
We manipulate the model's spatial features and self-attention mechanisms to maintain animation integrity.
Our meticulous control over spatial features and self-attention ensures structural consistency in the images.
 arXiv  Detail & Related papers  (2024-04-21T07:13:56Z)
- Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception [63.03288425612792]
 We propose bfAnyRef, a general MLLM model that can generate pixel-wise object perceptions and natural language descriptions from multi-modality references.
Our model achieves state-of-the-art results across multiple benchmarks, including diverse modality referring segmentation and region-level referring expression generation.
 arXiv  Detail & Related papers  (2024-03-05T13:45:46Z)
- AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models
  without Specific Tuning [92.33690050667475]
 AnimateDiff is a framework for animating personalized T2I models without requiring model-specific tuning.
We propose MotionLoRA, a lightweight fine-tuning technique for AnimateDiff that enables a pre-trained motion module to adapt to new motion patterns.
Results show that our approaches help these models generate temporally smooth animation clips while preserving the visual quality and motion diversity.
 arXiv  Detail & Related papers  (2023-07-10T17:34:16Z)
- Make-An-Animation: Large-Scale Text-conditional 3D Human Motion
  Generation [47.272177594990104]
 We introduce Make-An-Animation, a text-conditioned human motion generation model.
It learns more diverse poses and prompts from large-scale image-text datasets.
It reaches state-of-the-art performance on text-to-motion generation.
 arXiv  Detail & Related papers  (2023-05-16T17:58:43Z)
- FLAME: Free-form Language-based Motion Synthesis & Editing [17.70085940884357]
 We propose a diffusion-based motion synthesis and editing model named FLAME.
 FLAME can generate high-fidelity motions well aligned with the given text.
It can edit the parts of the motion, both frame-wise and joint-wise, without any fine-tuning.
 arXiv  Detail & Related papers  (2022-09-01T10:34:57Z)
- Language-Guided Face Animation by Recurrent StyleGAN-based Generator [87.56260982475564]
 We study a novel task, language-guided face animation, that aims to animate a static face image with the help of languages.
We propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames.
 arXiv  Detail & Related papers  (2022-08-11T02:57:30Z)
- TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
 We address the problem of generating diverse 3D human motions from textual descriptions.
We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data.
We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
 arXiv  Detail & Related papers  (2022-04-25T14:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.