Related papers: Detection of Intoxicated Individuals from Facial Video Sequences via a Recurrent Fusion Model

Detection of Intoxicated Individuals from Facial Video Sequences via a Recurrent Fusion Model

URL: http://arxiv.org/abs/2512.04536v1
Date: Thu, 04 Dec 2025 07:34:04 GMT
Title: Detection of Intoxicated Individuals from Facial Video Sequences via a Recurrent Fusion Model
Authors: Bita Baroutian, Atefe Aghaei, Mohsen Ebrahimi Moghaddam,
Abstract summary: This study introduces a novel-based facial sequence analysis approach dedicated to the detection of alcohol intoxication.<n>The method integrates facial landmark analysis via a Graph Attention Network (GAT) with visual features extracted using 3D ResNet.<n> Experimental results show that our approach achieves 95.82% accuracy, 0.977 precision, and 0.97 recall, outperforming prior methods.
Score: 0.4779196219827507
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Alcohol consumption is a significant public health concern and a major cause of accidents and fatalities worldwide. This study introduces a novel video-based facial sequence analysis approach dedicated to the detection of alcohol intoxication. The method integrates facial landmark analysis via a Graph Attention Network (GAT) with spatiotemporal visual features extracted using a 3D ResNet. These features are dynamically fused with adaptive prioritization to enhance classification performance. Additionally, we introduce a curated dataset comprising 3,542 video segments derived from 202 individuals to support training and evaluation. Our model is compared against two baselines: a custom 3D-CNN and a VGGFace+LSTM architecture. Experimental results show that our approach achieves 95.82% accuracy, 0.977 precision, and 0.97 recall, outperforming prior methods. The findings demonstrate the model's potential for practical deployment in public safety systems for non-invasive, reliable alcohol intoxication detection.

Related papers

ExposeAnyone: Personalized Audio-to-Expression Diffusion Models Are Robust Zero-Shot Face Forgery Detectors [58.45131932883374]
We propose a fully self-supervised approach to detect deepfakes in videos.<n>Our model computes the identity distances between suspected videos and personalized subjects via diffusion reconstruction errors.<n>Our method is highly robust to corruptions such as blur and compression, highlighting the applicability in real-world face forgery detection.
arXiv Detail & Related papers (2026-01-05T18:59:54Z)
Direct Video-Based Spatiotemporal Deep Learning for Cattle Lameness Detection [0.0]
This study proposes a framework for automated cattle lameness detection using publicly available video data.<n>Two deep learning architectures were trained and evaluated.<n>The 3D CNN achieved a video-level classification accuracy of 90%, with a precision, recall, and 85% each, outperforming the ConvLSD2 model.
arXiv Detail & Related papers (2025-04-23T04:17:41Z)
Uncertainty Estimation for 3D Object Detection via Evidential Learning [63.61283174146648]
We introduce a framework for quantifying uncertainty in 3D object detection by leveraging an evidential learning loss on Bird's Eye View representations in the 3D detector. We demonstrate both the efficacy and importance of these uncertainty estimates on identifying out-of-distribution scenes, poorly localized objects, and missing (false negative) detections.
arXiv Detail & Related papers (2024-10-31T13:13:32Z)
Advanced Gesture Recognition for Autism Spectrum Disorder Detection: Integrating YOLOv7, Video Augmentation, and VideoMAE for Naturalistic Video Analysis [10.298059998417104]
Repetitive motor behaviors such as spinning, head banging, and arm flapping are key indicators for diagnosis of autism spectrum disorder (ASD)<n>This study focuses on distinguishing between children with ASD and typically developed (TD) peers by analyzing videos captured in natural, uncontrolled environments.<n>We adopt a pipeline integrating YOLOv7-based detection, extensive video augmentations, and the VideoMAE framework, which efficiently captures both spatial and temporal features through a high-ratio masking and reconstruction strategy.
arXiv Detail & Related papers (2024-10-12T02:55:37Z)
Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis [3.1851272788128644]
Existing AI-based Parkinson's Disease detection methods primarily focus on unimodal analysis of motor or speech tasks.<n>We propose a novel Uncertainty-calibrated Fusion Network (UFNet) that leverages this multimodal data to enhance diagnostic accuracy.<n>UFNet significantly outperformed single-task models in terms of accuracy, area under the ROC curve (AUROC), and sensitivity while maintaining non-inferior specificity.
arXiv Detail & Related papers (2024-06-21T04:02:19Z)
Potion: Towards Poison Unlearning [47.00450933765504]
Adversarial attacks by malicious actors on machine learning systems pose significant risks. The challenge in resolving such an attack arises in practice when only a subset of the poisoned data can be identified. Our work addresses two key challenges to advance the state of the art in poison unlearning.
arXiv Detail & Related papers (2024-06-13T14:35:11Z)
AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step. We first employ a human token to probe a human location in the image and encode global features for each instance. Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z)
TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection [59.04634695294402]
Video anomaly detection (VAD) without human monitoring is a complex computer vision task. Privacy leakage in VAD allows models to pick up and amplify unnecessary biases related to people's personal information. We propose TeD-SPAD, a privacy-aware video anomaly detection framework that destroys visual private information in a self-supervised manner.
arXiv Detail & Related papers (2023-08-21T22:42:55Z)
TempNet: Temporal Attention Towards the Detection of Animal Behaviour in Videos [63.85815474157357]
We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos. TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder. We demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events.
arXiv Detail & Related papers (2022-11-17T23:55:12Z)
Learning to Predict Fitness for Duty using Near Infrared Periocular Iris Images [8.79172220232372]
This study focuses on determining the effect of external factors on the Central Nervous System. The goal is to analyse how this impacts iris and pupil movement behaviours. This paper proposes a modified MobileNetV2 to classify iris NIR images taken from subjects under alcohol/drugs/sleepiness influences.
arXiv Detail & Related papers (2022-09-04T19:48:45Z)
LiftFormer: 3D Human Pose Estimation using attention models [0.0]
We propose the usage of models to obtain more accurate 3D predictions by leveraging attention mechanisms on ordered sequences human poses in videos. Our method consistently outperforms the previous best results from the literature when using both 2D keypoint predictors by 0.3 mm (44.8 MPJPE, 0.7% improvement) and ground truth inputs by 2mm (MPJPE: 31.9, 8.4% improvement) on Human3.6M. Our 3D lifting model's accuracy exceeds that of other end-to-end or SMPL approaches and is comparable to many multi-view methods.
arXiv Detail & Related papers (2020-09-01T11:05:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.