TPA-Net: Generate A Dataset for Text to Physics-based Animation
- URL: http://arxiv.org/abs/2211.13887v1
- Date: Fri, 25 Nov 2022 04:26:41 GMT
- Title: TPA-Net: Generate A Dataset for Text to Physics-based Animation
- Authors: Yuxing Qiu, Feng Gao, Minchen Li, Govind Thattai, Yin Yang, Chenfanfu
Jiang
- Abstract summary: We present an autonomous data generation technique and a dataset, which intend to narrow the gap with a large number of multi-modal, 3D Text-to-Video/Simulation (T2V/S) data.
We take advantage of state-of-the-art physical simulation methods to simulate diverse scenarios, including elastic deformations, material fractures, collisions, turbulence, etc.
High-quality, multi-view rendering videos are supplied for the benefit of T2V, Neural Radiance Fields (NeRF), and other communities.
- Score: 27.544423833402572
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent breakthroughs in Vision-Language (V&L) joint research have achieved
remarkable results in various text-driven tasks. High-quality Text-to-video
(T2V), a task that has been long considered mission-impossible, was proven
feasible with reasonably good results in latest works. However, the resulting
videos often have undesired artifacts largely because the system is purely
data-driven and agnostic to the physical laws. To tackle this issue and further
push T2V towards high-level physical realism, we present an autonomous data
generation technique and a dataset, which intend to narrow the gap with a large
number of multi-modal, 3D Text-to-Video/Simulation (T2V/S) data. In the
dataset, we provide high-resolution 3D physical simulations for both solids and
fluids, along with textual descriptions of the physical phenomena. We take
advantage of state-of-the-art physical simulation methods (i) Incremental
Potential Contact (IPC) and (ii) Material Point Method (MPM) to simulate
diverse scenarios, including elastic deformations, material fractures,
collisions, turbulence, etc. Additionally, high-quality, multi-view rendering
videos are supplied for the benefit of T2V, Neural Radiance Fields (NeRF), and
other communities. This work is the first step towards fully automated
Text-to-Video/Simulation (T2V/S). Live examples and subsequent work are at
https://sites.google.com/view/tpa-net.
Related papers
- WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation [43.71082938654985]
We introduce the World Simulator Assistant (WISA), an effective framework for decomposing and incorporating physical principles into T2V models.
WISA decomposes physical principles into textual physical descriptions, qualitative physical categories, and quantitative physical properties.
We propose a novel video dataset, WISA-32K, collected based on qualitative physical categories.
arXiv Detail & Related papers (2025-03-11T08:10:03Z) - Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos [61.925837909969815]
We introduce Video2Policy, a novel framework that leverages internet RGB videos to reconstruct tasks based on everyday human behavior.
Our method can successfully train RL policies on such tasks, including complex and challenging tasks such as throwing.
We show that the generated simulation data can be scaled up for training a general policy, and it can be transferred back to the real robot in a Real2Sim2Real way.
arXiv Detail & Related papers (2025-02-14T03:22:03Z) - InTraGen: Trajectory-controlled Video Generation for Object Interactions [100.79494904451246]
InTraGen is a pipeline for improved trajectory-based generation of object interaction scenarios.
Our results demonstrate improvements in both visual fidelity and quantitative performance.
arXiv Detail & Related papers (2024-11-25T14:27:50Z) - Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation [9.306758077479472]
We introduce a novel approach that leverages multi-modal foundation models and video diffusion to achieve enhanced 4D dynamic scene simulation.
This integrated framework enables accurate prediction and realistic simulation of dynamic interactions in real-world scenarios.
arXiv Detail & Related papers (2024-11-21T18:55:23Z) - PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation [29.831214435147583]
We present PhysGen, a novel image-to-video generation method.
It produces a realistic, physically plausible, and temporally consistent video.
Our key insight is to integrate model-based physical simulation with a data-driven video generation process.
arXiv Detail & Related papers (2024-09-27T17:59:57Z) - T2M-X: Learning Expressive Text-to-Motion Generation from Partially Annotated Data [6.6240820702899565]
Existing methods only generate body motion data, excluding facial expressions and hand movements.
Recent attempts to create such a dataset have resulted in either motion inconsistency among different body parts.
We propose T2M-X, a two-stage method that learns expressive text-to-motion generation from partially annotated data.
arXiv Detail & Related papers (2024-09-20T06:20:00Z) - VIMI: Grounding Video Generation through Multi-modal Instruction [89.90065445082442]
Existing text-to-video diffusion models rely solely on text-only encoders for their pretraining.
We construct a large-scale multimodal prompt dataset by employing retrieval methods to pair in-context examples with the given text prompts.
We finetune the model from the first stage on three video generation tasks, incorporating multi-modal instructions.
arXiv Detail & Related papers (2024-07-08T18:12:49Z) - Make-A-Video: Text-to-Video Generation without Text-Video Data [69.20996352229422]
Make-A-Video is an approach for translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V)
We design a simple yet effective way to build on T2I models with novel and effective spatial-temporal modules.
In all aspects, spatial and temporal resolution, faithfulness to text, and quality, Make-A-Video sets the new state-of-the-art in text-to-video generation.
arXiv Detail & Related papers (2022-09-29T13:59:46Z) - Make It Move: Controllable Image-to-Video Generation with Text
Descriptions [69.52360725356601]
TI2V task aims at generating videos from a static image and a text description.
To address these challenges, we propose a Motion Anchor-based video GEnerator (MAGE) with an innovative motion anchor structure.
Experiments conducted on datasets verify the effectiveness of MAGE and show appealing potentials of TI2V task.
arXiv Detail & Related papers (2021-12-06T07:00:36Z) - Video Generation from Text Employing Latent Path Construction for
Temporal Modeling [70.06508219998778]
Video generation is one of the most challenging tasks in Machine Learning and Computer Vision fields of study.
In this paper, we tackle the text to video generation problem, which is a conditional form of video generation.
We believe that video generation from natural language sentences will have an important impact on Artificial Intelligence.
arXiv Detail & Related papers (2021-07-29T06:28:20Z) - Fed-Sim: Federated Simulation for Medical Imaging [131.56325440976207]
We introduce a physics-driven generative approach that consists of two learnable neural modules.
We show that our data synthesis framework improves the downstream segmentation performance on several datasets.
arXiv Detail & Related papers (2020-09-01T19:17:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.