Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed
Video
- URL: http://arxiv.org/abs/2303.16053v2
- Date: Mon, 21 Aug 2023 14:18:55 GMT
- Title: Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed
Video
- Authors: Wenzheng Zeng, Yang Xiao, Sicheng Wei, Jinfang Gan, Xintao Zhang,
Zhiguo Cao, Zhiwen Fang, Joey Tianyi Zhou
- Abstract summary: Real-time eyeblink detection in the wild can widely serve for fatigue detection, face anti-spoofing, emotion analysis, etc.
We shed light on this research field for the first time with essential contributions on dataset, theory, and practices.
- Score: 41.4300990443683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time eyeblink detection in the wild can widely serve for fatigue
detection, face anti-spoofing, emotion analysis, etc. The existing research
efforts generally focus on single-person cases towards trimmed video. However,
multi-person scenario within untrimmed videos is also important for practical
applications, which has not been well concerned yet. To address this, we shed
light on this research field for the first time with essential contributions on
dataset, theory, and practices. In particular, a large-scale dataset termed
MPEblink that involves 686 untrimmed videos with 8748 eyeblink events is
proposed under multi-person conditions. The samples are captured from
unconstrained films to reveal "in the wild" characteristics. Meanwhile, a
real-time multi-person eyeblink detection method is also proposed. Being
different from the existing counterparts, our proposition runs in a one-stage
spatio-temporal way with end-to-end learning capacity. Specifically, it
simultaneously addresses the sub-tasks of face detection, face tracking, and
human instance-level eyeblink detection. This paradigm holds 2 main advantages:
(1) eyeblink features can be facilitated via the face's global context (e.g.,
head pose and illumination condition) with joint optimization and interaction,
and (2) addressing these sub-tasks in parallel instead of sequential manner can
save time remarkably to meet the real-time running requirement. Experiments on
MPEblink verify the essential challenges of real-time multi-person eyeblink
detection in the wild for untrimmed video. Our method also outperforms existing
approaches by large margins and with a high inference speed.
Related papers
- Latent Spatiotemporal Adaptation for Generalized Face Forgery Video Detection [22.536129731902783]
We propose a Latemporal Spatio(LAST) approach to facilitate generalized face video detection.
We first model thetemporal patterns face videos by incorporating a lightweight CNN to extract local spatial features of each frame.
Then we learn the long-termtemporal representations in latent space videos, which should contain more clues than in pixel space.
arXiv Detail & Related papers (2023-09-09T13:40:44Z) - Spatiotemporal Pyramidal CNN with Depth-Wise Separable Convolution for
Eye Blinking Detection in the Wild [0.0]
Eye blinking detection plays an essential role in deception detection, driving fatigue detection, etc.
Two problems are addressed: how the eye blinking detection model can learn efficiently from different resolutions of eye pictures in diverse conditions; and how to reduce the size of the detection model for faster inference time.
arXiv Detail & Related papers (2023-06-20T04:59:09Z) - Detection of Real-time DeepFakes in Video Conferencing with Active
Probing and Corneal Reflection [43.272069005626584]
We describe a new active forensic method to detect real-time DeepFakes.
We authenticate video calls by displaying a distinct pattern on the screen and using the corneal reflection extracted from the images of the call participant's face.
This pattern can be induced by a call participant displaying on a shared screen or directly integrated into the video-call client.
arXiv Detail & Related papers (2022-10-21T23:31:17Z) - Multi-view Tracking Using Weakly Supervised Human Motion Prediction [60.972708589814125]
We argue that an even more effective approach is to predict people motion over time and infer people's presence in individual frames from these.
This enables to enforce consistency both over time and across views of a single temporal frame.
We validate our approach on the PETS2009 and WILDTRACK datasets and demonstrate that it outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-10-19T17:58:23Z) - Leveraging Real Talking Faces via Self-Supervision for Robust Forgery
Detection [112.96004727646115]
We develop a method to detect face-manipulated videos using real talking faces.
We show that our method achieves state-of-the-art performance on cross-manipulation generalisation and robustness experiments.
Our results suggest that leveraging natural and unlabelled videos is a promising direction for the development of more robust face forgery detectors.
arXiv Detail & Related papers (2022-01-18T17:14:54Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - Blind Video Temporal Consistency via Deep Video Prior [61.062900556483164]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly.
We show that temporal consistency can be achieved by training a convolutional network on a video with the Deep Video Prior.
arXiv Detail & Related papers (2020-10-22T16:19:20Z) - TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training
Model [51.14840210957289]
Multi-object tracking is a fundamental vision problem that has been studied for a long time.
Despite the success of Tracking by Detection (TBD), this two-step method is too complicated to train in an end-to-end manner.
We propose a concise end-to-end model TubeTK which only needs one step training by introducing the bounding-tube" to indicate temporal-spatial locations of objects in a short video clip.
arXiv Detail & Related papers (2020-06-10T06:45:05Z) - Deep Frequent Spatial Temporal Learning for Face Anti-Spoofing [9.435020319411311]
Face anti-spoofing is crucial for the security of face recognition system, by avoiding invaded with presentation attack.
Previous works have shown the effectiveness of using depth and temporal supervision for this task.
We propose a novel two stream FreqSaptialTemporalNet for face anti-spoofing which simultaneously takes advantage of frequent, spatial and temporal information.
arXiv Detail & Related papers (2020-01-20T06:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.