Related papers: Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video

Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video

URL: http://arxiv.org/abs/2303.16053v2
Date: Mon, 21 Aug 2023 14:18:55 GMT
Title: Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video
Authors: Wenzheng Zeng, Yang Xiao, Sicheng Wei, Jinfang Gan, Xintao Zhang, Zhiguo Cao, Zhiwen Fang, Joey Tianyi Zhou
Abstract summary: Real-time eyeblink detection in the wild can widely serve for fatigue detection, face anti-spoofing, emotion analysis, etc. We shed light on this research field for the first time with essential contributions on dataset, theory, and practices.
Score: 41.4300990443683
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Real-time eyeblink detection in the wild can widely serve for fatigue detection, face anti-spoofing, emotion analysis, etc. The existing research efforts generally focus on single-person cases towards trimmed video. However, multi-person scenario within untrimmed videos is also important for practical applications, which has not been well concerned yet. To address this, we shed light on this research field for the first time with essential contributions on dataset, theory, and practices. In particular, a large-scale dataset termed MPEblink that involves 686 untrimmed videos with 8748 eyeblink events is proposed under multi-person conditions. The samples are captured from unconstrained films to reveal "in the wild" characteristics. Meanwhile, a real-time multi-person eyeblink detection method is also proposed. Being different from the existing counterparts, our proposition runs in a one-stage spatio-temporal way with end-to-end learning capacity. Specifically, it simultaneously addresses the sub-tasks of face detection, face tracking, and human instance-level eyeblink detection. This paradigm holds 2 main advantages: (1) eyeblink features can be facilitated via the face's global context (e.g., head pose and illumination condition) with joint optimization and interaction, and (2) addressing these sub-tasks in parallel instead of sequential manner can save time remarkably to meet the real-time running requirement. Experiments on MPEblink verify the essential challenges of real-time multi-person eyeblink detection in the wild for untrimmed video. Our method also outperforms existing approaches by large margins and with a high inference speed.

Related papers

MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval [61.414236415351446]
We propose MomentSeeker, a novel benchmark for long-video moment retrieval (LMVR)<n>MomentSeeker is created based on long and diverse videos, averaging over 1200 seconds in duration.<n>It covers a variety of real-world scenarios in three levels: global-level, event-level, object-level, covering common tasks like action recognition, object localization, and causal reasoning.
arXiv Detail & Related papers (2025-02-18T05:50:23Z)
Latent Spatiotemporal Adaptation for Generalized Face Forgery Video Detection [22.536129731902783]
We propose a Latemporal Spatio(LAST) approach to facilitate generalized face video detection. We first model thetemporal patterns face videos by incorporating a lightweight CNN to extract local spatial features of each frame. Then we learn the long-termtemporal representations in latent space videos, which should contain more clues than in pixel space.
arXiv Detail & Related papers (2023-09-09T13:40:44Z)
Spatiotemporal Pyramidal CNN with Depth-Wise Separable Convolution for Eye Blinking Detection in the Wild [0.0]
Eye blinking detection plays an essential role in deception detection, driving fatigue detection, etc. Two problems are addressed: how the eye blinking detection model can learn efficiently from different resolutions of eye pictures in diverse conditions; and how to reduce the size of the detection model for faster inference time.
arXiv Detail & Related papers (2023-06-20T04:59:09Z)
Detection of Real-time DeepFakes in Video Conferencing with Active Probing and Corneal Reflection [43.272069005626584]
We describe a new active forensic method to detect real-time DeepFakes. We authenticate video calls by displaying a distinct pattern on the screen and using the corneal reflection extracted from the images of the call participant's face. This pattern can be induced by a call participant displaying on a shared screen or directly integrated into the video-call client.
arXiv Detail & Related papers (2022-10-21T23:31:17Z)
Multi-view Tracking Using Weakly Supervised Human Motion Prediction [60.972708589814125]
We argue that an even more effective approach is to predict people motion over time and infer people's presence in individual frames from these. This enables to enforce consistency both over time and across views of a single temporal frame. We validate our approach on the PETS2009 and WILDTRACK datasets and demonstrate that it outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-10-19T17:58:23Z)
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection [112.96004727646115]
We develop a method to detect face-manipulated videos using real talking faces. We show that our method achieves state-of-the-art performance on cross-manipulation generalisation and robustness experiments. Our results suggest that leveraging natural and unlabelled videos is a promising direction for the development of more robust face forgery detectors.
arXiv Detail & Related papers (2022-01-18T17:14:54Z)
JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks. We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection. We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z)
Blind Video Temporal Consistency via Deep Video Prior [61.062900556483164]
We present a novel and general approach for blind video temporal consistency. Our method is only trained on a pair of original and processed videos directly. We show that temporal consistency can be achieved by training a convolutional network on a video with the Deep Video Prior.
arXiv Detail & Related papers (2020-10-22T16:19:20Z)
TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model [51.14840210957289]
Multi-object tracking is a fundamental vision problem that has been studied for a long time. Despite the success of Tracking by Detection (TBD), this two-step method is too complicated to train in an end-to-end manner. We propose a concise end-to-end model TubeTK which only needs one step training by introducing the bounding-tube" to indicate temporal-spatial locations of objects in a short video clip.
arXiv Detail & Related papers (2020-06-10T06:45:05Z)
Deep Frequent Spatial Temporal Learning for Face Anti-Spoofing [9.435020319411311]
Face anti-spoofing is crucial for the security of face recognition system, by avoiding invaded with presentation attack. Previous works have shown the effectiveness of using depth and temporal supervision for this task. We propose a novel two stream FreqSaptialTemporalNet for face anti-spoofing which simultaneously takes advantage of frequent, spatial and temporal information.
arXiv Detail & Related papers (2020-01-20T06:02:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.