Diverse Dance Synthesis via Keyframes with Transformer Controllers
- URL: http://arxiv.org/abs/2207.05906v1
- Date: Wed, 13 Jul 2022 00:56:46 GMT
- Title: Diverse Dance Synthesis via Keyframes with Transformer Controllers
- Authors: Junjun Pan, Siyuan Wang, Junxuan Bai, Ju Dai
- Abstract summary: We propose a novel motion-based motion generation network based on multiple constraints, which can achieve diverse dance synthesis via learned knowledge.
The backbone of our network is a hierarchical RNN module composed of two long short-term memory (LSTM) units, in which the first LSTM is utilized to embed the posture information of the historical frames into a latent space.
Our framework contains two Transformer-based controllers, which are used to model the constraints of the root trajectory and the velocity factor respectively.
- Score: 10.23813069057791
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing keyframe-based motion synthesis mainly focuses on the generation of
cyclic actions or short-term motion, such as walking, running, and transitions
between close postures. However, these methods will significantly degrade the
naturalness and diversity of the synthesized motion when dealing with complex
and impromptu movements, e.g., dance performance and martial arts. In addition,
current research lacks fine-grained control over the generated motion, which is
essential for intelligent human-computer interaction and animation creation. In
this paper, we propose a novel keyframe-based motion generation network based
on multiple constraints, which can achieve diverse dance synthesis via learned
knowledge. Specifically, the algorithm is mainly formulated based on the
recurrent neural network (RNN) and the Transformer architecture. The backbone
of our network is a hierarchical RNN module composed of two long short-term
memory (LSTM) units, in which the first LSTM is utilized to embed the posture
information of the historical frames into a latent space, and the second one is
employed to predict the human posture for the next frame. Moreover, our
framework contains two Transformer-based controllers, which are used to model
the constraints of the root trajectory and the velocity factor respectively, so
as to better utilize the temporal context of the frames and achieve
fine-grained motion control. We verify the proposed approach on a dance dataset
containing a wide range of contemporary dance. The results of three
quantitative analyses validate the superiority of our algorithm. The video and
qualitative experimental results demonstrate that the complex motion sequences
generated by our algorithm can achieve diverse and smooth motion transitions
between keyframes, even for long-term synthesis.
Related papers
- Shape Conditioned Human Motion Generation with Diffusion Model [0.0]
We propose a Shape-conditioned Motion Diffusion model (SMD), which enables the generation of motion sequences directly in mesh format.
We also propose a Spectral-Temporal Autoencoder (STAE) to leverage cross-temporal dependencies within the spectral domain.
arXiv Detail & Related papers (2024-05-10T19:06:41Z) - DiffusionPhase: Motion Diffusion in Frequency Domain [69.811762407278]
We introduce a learning-based method for generating high-quality human motion sequences from text descriptions.
Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences.
We develop a network encoder that converts the motion space into a compact yet expressive parameterized phase space.
arXiv Detail & Related papers (2023-12-07T04:39:22Z) - TLControl: Trajectory and Language Control for Human Motion Synthesis [68.09806223962323]
We present TLControl, a novel method for realistic human motion synthesis.
It incorporates both low-level Trajectory and high-level Language semantics controls.
It is practical for interactive and high-quality animation generation.
arXiv Detail & Related papers (2023-11-28T18:54:16Z) - Leaping Into Memories: Space-Time Deep Feature Synthesis [93.10032043225362]
We propose LEAPS, an architecture-independent method for synthesizing videos from internal models.
We quantitatively and qualitatively evaluate the applicability of LEAPS by inverting a range of architectures convolutional attention-based on Kinetics-400.
arXiv Detail & Related papers (2023-03-17T12:55:22Z) - MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis.
We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework.
We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z) - GANimator: Neural Motion Synthesis from a Single Sequence [38.361579401046875]
We present GANimator, a generative model that learns to synthesize novel motions from a single, short motion sequence.
GANimator generates motions that resemble the core elements of the original motion, while simultaneously synthesizing novel and diverse movements.
We show a number of applications, including crowd simulation, key-frame editing, style transfer, and interactive control, which all learn from a single input sequence.
arXiv Detail & Related papers (2022-05-05T13:04:14Z) - Motion-aware Dynamic Graph Neural Network for Video Compressive Sensing [14.67994875448175]
Video snapshot imaging (SCI) utilizes a 2D detector to capture sequential video frames and compress them into a single measurement.
Most existing reconstruction methods are incapable of efficiently capturing long-range spatial and temporal dependencies.
We propose a flexible and robust approach based on the graph neural network (GNN) to efficiently model non-local interactions between pixels in space and time regardless of the distance.
arXiv Detail & Related papers (2022-03-01T12:13:46Z) - Towards Lightweight Neural Animation : Exploration of Neural Network
Pruning in Mixture of Experts-based Animation Models [3.1733862899654652]
We apply pruning algorithms to compress a neural network in the context of interactive character animation.
With the same number of experts and parameters, the pruned model produces less motion artifacts than the dense model.
This work demonstrates that, with the same number of experts and parameters, the pruned model produces less motion artifacts than the dense model.
arXiv Detail & Related papers (2022-01-11T16:39:32Z) - Continuous-Time Video Generation via Learning Motion Dynamics with
Neural ODE [26.13198266911874]
We propose a novel video generation approach that learns separate distributions for motion and appearance.
We employ a two-stage approach where the first stage converts a noise vector to a sequence of keypoints in arbitrary frame rates, and the second stage synthesizes videos based on the given keypoints sequence and the appearance noise vector.
arXiv Detail & Related papers (2021-12-21T03:30:38Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.