Detection of (Hidden) Emotions from Videos using Muscles Movements and
Face Manifold Embedding
- URL: http://arxiv.org/abs/2211.00233v1
- Date: Tue, 1 Nov 2022 02:48:35 GMT
- Title: Detection of (Hidden) Emotions from Videos using Muscles Movements and
Face Manifold Embedding
- Authors: Juni Kim, Zhikang Dong, Eric Guan, Judah Rosenthal, Shi Fu, Miriam
Rafailovich, Pawel Polak
- Abstract summary: We provide a new non-invasive, easy-to-scale method for (hidden) emotion detection from videos of human faces.
Our approach combines face manifold detection for accurate location of the face in the video with local face manifold embedding.
In the next step, we employ the Digital Image Speckle Correlation (DISC) and the optical flow algorithm to compute the pattern of micro-movements in the face.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide a new non-invasive, easy-to-scale for large amounts of subjects
and a remotely accessible method for (hidden) emotion detection from videos of
human faces. Our approach combines face manifold detection for accurate
location of the face in the video with local face manifold embedding to create
a common domain for the measurements of muscle micro-movements that is
invariant to the movement of the subject in the video. In the next step, we
employ the Digital Image Speckle Correlation (DISC) and the optical flow
algorithm to compute the pattern of micro-movements in the face. The
corresponding vector field is mapped back to the original space and
superimposed on the original frames of the videos. Hence, the resulting videos
include additional information about the direction of the movement of the
muscles in the face. We take the publicly available CK++ dataset of visible
emotions and add to it videos of the same format but with hidden emotions. We
process all the videos using our micro-movement detection and use the results
to train a state-of-the-art network for emotions classification from videos --
Frame Attention Network (FAN). Although the original FAN model achieves very
high out-of-sample performance on the original CK++ videos, it does not perform
so well on hidden emotions videos. The performance improves significantly when
the model is trained and tested on videos with the vector fields of muscle
movements. Intuitively, the corresponding arrows serve as edges in the image
that are easily captured by the convolutions filters in the FAN network.
Related papers
- Replace Anyone in Videos [39.4019337319795]
We propose the ReplaceAnyone framework, which focuses on localizing and manipulating human motion in videos.
Specifically, we formulate this task as an image-conditioned pose-driven video inpainting paradigm.
We introduce diverse mask forms involving regular and irregular shapes to avoid shape leakage and allow granular local control.
arXiv Detail & Related papers (2024-09-30T03:27:33Z) - Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion [9.134743677331517]
We propose a pre-trained image-to-video model to disentangle appearance from motion.
Our method, called motion-textual inversion, leverages our observation that image-to-video models extract appearance mainly from the (latent) image input.
By operating on an inflated motion-text embedding containing multiple text/image embedding tokens per frame, we achieve a high temporal motion granularity.
Our approach does not require spatial alignment between the motion reference video and target image, generalizes across various domains, and can be applied to various tasks.
arXiv Detail & Related papers (2024-08-01T10:55:20Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors [63.43133768897087]
We propose a method to convert open-domain images into animated videos.
The key idea is to utilize the motion prior to text-to-video diffusion models by incorporating the image into the generative process as guidance.
Our proposed method can produce visually convincing and more logical & natural motions, as well as higher conformity to the input image.
arXiv Detail & Related papers (2023-10-18T14:42:16Z) - Masked Motion Encoding for Self-Supervised Video Representation Learning [84.24773072241945]
We present Masked Motion MME, a new pre-training paradigm that reconstructs both appearance and motion information to explore temporal clues.
Motivated by the fact that human is able to recognize an action by tracking objects' position changes and shape changes, we propose to reconstruct a motion trajectory that represents these two kinds of change in the masked regions.
Pre-trained with our MME paradigm, the model is able to anticipate long-term and fine-grained motion details.
arXiv Detail & Related papers (2022-10-12T11:19:55Z) - Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z) - Guess What Moves: Unsupervised Video and Image Segmentation by
Anticipating Motion [92.80981308407098]
We propose an approach that combines the strengths of motion-based and appearance-based segmentation.
We propose to supervise an image segmentation network, tasking it with predicting regions that are likely to contain simple motion patterns.
In the unsupervised video segmentation mode, the network is trained on a collection of unlabelled videos, using the learning process itself as an algorithm to segment these videos.
arXiv Detail & Related papers (2022-05-16T17:55:34Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos [23.153335327822685]
We learn a motion-centric representation of surgical video demonstrations by grouping them into action segments/sub-goals/options.
We present Motion2Vec, an algorithm that learns a deep embedding feature space from video observations.
We demonstrate the use of this representation to imitate surgical suturing motions from publicly available videos of the JIGSAWS dataset.
arXiv Detail & Related papers (2020-05-31T15:46:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.