Real-time Face Mask Detection in Video Data
- URL: http://arxiv.org/abs/2105.01816v1
- Date: Wed, 5 May 2021 01:03:34 GMT
- Title: Real-time Face Mask Detection in Video Data
- Authors: Yuchen Ding, Zichen Li, David Yastremsky
- Abstract summary: We present a robust deep learning pipeline that is capable of identifying correct and incorrect mask-wearing from real-time video streams.
We devised two separate approaches and evaluated their performance and run-time efficiency.
- Score: 0.5371337604556311
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In response to the ongoing COVID-19 pandemic, we present a robust deep
learning pipeline that is capable of identifying correct and incorrect
mask-wearing from real-time video streams. To accomplish this goal, we devised
two separate approaches and evaluated their performance and run-time
efficiency. The first approach leverages a pre-trained face detector in
combination with a mask-wearing image classifier trained on a large-scale
synthetic dataset. The second approach utilizes a state-of-the-art object
detection network to perform localization and classification of faces in one
shot, fine-tuned on a small set of labeled real-world images. The first
pipeline achieved a test accuracy of 99.97% on the synthetic dataset and
maintained 6 FPS running on video data. The second pipeline achieved a mAP(0.5)
of 89% on real-world images while sustaining 52 FPS on video data. We have
concluded that if a larger dataset with bounding-box labels can be curated,
this task is best suited using object detection architectures such as YOLO and
SSD due to their superior inference speed and satisfactory performance on key
evaluation metrics.
Related papers
- Practical Video Object Detection via Feature Selection and Aggregation [18.15061460125668]
Video object detection (VOD) needs to concern the high across-frame variation in object appearance, and the diverse deterioration in some frames.
Most of contemporary aggregation methods are tailored for two-stage detectors, suffering from high computational costs.
This study invents a very simple yet potent strategy of feature selection and aggregation, gaining significant accuracy at marginal computational expense.
arXiv Detail & Related papers (2024-07-29T02:12:11Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - SimulFlow: Simultaneously Extracting Feature and Identifying Target for
Unsupervised Video Object Segmentation [28.19471998380114]
Unsupervised video object segmentation (UVOS) aims at detecting the primary objects in a given video sequence without any human interposing.
Most existing methods rely on two-stream architectures that separately encode the appearance and motion information before fusing them to identify the target and generate object masks.
We propose a novel UVOS model called SimulFlow that simultaneously performs feature extraction and target identification.
arXiv Detail & Related papers (2023-11-30T06:44:44Z) - Mask Propagation for Efficient Video Semantic Segmentation [63.09523058489429]
Video Semantic baseline degradation (VSS) involves assigning a semantic label to each pixel in a video sequence.
We propose an efficient mask propagation framework for VSS, called SSSS.
Our framework reduces up to 4x FLOPs compared to the per-frame Mask2Former with only up to 2% mIoU on the Cityscapes validation set.
arXiv Detail & Related papers (2023-10-29T09:55:28Z) - Randomize to Generalize: Domain Randomization for Runway FOD Detection [1.4249472316161877]
Tiny Object Detection is challenging due to small size, low resolution, occlusion, background clutter, lighting conditions and small object-to-image ratio.
We propose a novel two-stage methodology Synthetic Image Augmentation (SRIA) to enhance generalization capabilities of models encountering 2D datasets.
We report that detection accuracy improved from an initial 41% to 92% for OOD test set.
arXiv Detail & Related papers (2023-09-23T05:02:31Z) - Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for
Enhanced Video Forgery Detection [19.432851794777754]
We present a novel approach for the detection of deepfake videos using a pair of vision transformers pre-trained by a self-supervised masked autoencoding setup.
Our method consists of two distinct components, one of which focuses on learning spatial information from individual RGB frames of the video, while the other learns temporal consistency information from optical flow fields generated from consecutive frames.
arXiv Detail & Related papers (2023-06-12T05:49:23Z) - It Takes Two: Masked Appearance-Motion Modeling for Self-supervised
Video Transformer Pre-training [76.69480467101143]
Self-supervised video transformer pre-training has recently benefited from the mask-and-predict pipeline.
We explicitly investigate motion cues in videos as extra prediction target and propose our Masked Appearance-Motion Modeling framework.
Our method learns generalized video representations and achieves 82.3% on Kinects-400, 71.3% on Something-Something V2, 91.5% on UCF101, and 62.5% on HMDB51.
arXiv Detail & Related papers (2022-10-11T08:05:18Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images.
We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image.
We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z) - COVID-19 Monitoring System using Social Distancing and Face Mask
Detection on Surveillance video datasets [0.0]
This paper proposes a comprehensive and effective solution to perform person detection, social distancing violation detection, face detection and face mask classification.
The system performs with an accuracy of 91.2% and F1 score of 90.79% on the labelled video dataset and has an average prediction time of 7.12 seconds for 78 frames of a video.
arXiv Detail & Related papers (2021-10-08T05:57:30Z) - Towards End-to-end Video-based Eye-Tracking [50.0630362419371]
Estimating eye-gaze from images alone is a challenging task due to un-observable person-specific factors.
We propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships.
We demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures.
arXiv Detail & Related papers (2020-07-26T12:39:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.