Related papers: YOLO11-CR: a Lightweight Convolution-and-Attention Framework for Accurate Fatigue Driving Detection

YOLO11-CR: a Lightweight Convolution-and-Attention Framework for Accurate Fatigue Driving Detection

URL: http://arxiv.org/abs/2508.13205v1
Date: Sat, 16 Aug 2025 07:19:04 GMT
Title: YOLO11-CR: a Lightweight Convolution-and-Attention Framework for Accurate Fatigue Driving Detection
Authors: Zhebin Jin, Ligang Dong,
Abstract summary: This paper introduces YOLO11-CR, a lightweight and efficient object detection model tailored for real-time fatigue monitoring.<n>YOLO11-CR introduces two key modules: the Convolution-and-Attention Fusion Module (CAFM) and the Rectangular Module (RCM)<n>Experiments on the DSM dataset demonstrated that YOLO11-CR achieves a precision of 87.17%, recall of 83.86%, mAP@50 of 88.09%, and mAP@50-95 of 55.93%.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Driver fatigue detection is of paramount importance for intelligent transportation systems due to its critical role in mitigating road traffic accidents. While physiological and vehicle dynamics-based methods offer accuracy, they are often intrusive, hardware-dependent, and lack robustness in real-world environments. Vision-based techniques provide a non-intrusive and scalable alternative, but still face challenges such as poor detection of small or occluded objects and limited multi-scale feature modeling. To address these issues, this paper proposes YOLO11-CR, a lightweight and efficient object detection model tailored for real-time fatigue detection. YOLO11-CR introduces two key modules: the Convolution-and-Attention Fusion Module (CAFM), which integrates local CNN features with global Transformer-based context to enhance feature expressiveness; and the Rectangular Calibration Module (RCM), which captures horizontal and vertical contextual information to improve spatial localization, particularly for profile faces and small objects like mobile phones. Experiments on the DSM dataset demonstrated that YOLO11-CR achieves a precision of 87.17%, recall of 83.86%, mAP@50 of 88.09%, and mAP@50-95 of 55.93%, outperforming baseline models significantly. Ablation studies further validate the effectiveness of the CAFM and RCM modules in improving both sensitivity and localization accuracy. These results demonstrate that YOLO11-CR offers a practical and high-performing solution for in-vehicle fatigue monitoring, with strong potential for real-world deployment and future enhancements involving temporal modeling, multi-modal data integration, and embedded optimization.

Related papers

RT-DETRv4: Painlessly Furthering Real-Time Object Detection with Vision Foundation Models [48.91205564876609]
We propose a cost-effective and highly adaptable distillation framework to enhance lightweight object detectors.<n>Our approach painlessly delivers striking and consistent performance gains across diverse DETR-based models.<n>Our new model family, RT-DETRv4, achieves state-of-the-art results on COCO, attaining AP scores of 49.7/53.5/55.4/57.0 at corresponding speeds of 273/169/124/78 FPS.
arXiv Detail & Related papers (2025-10-29T08:13:17Z)
Investigating Traffic Accident Detection Using Multimodal Large Language Models [3.4123736336071864]
This research investigates the zero-shot capabilities of multimodal large language models (MLLMs) for detecting and describing traffic accidents.<n>Results show Pixtral as the top performer with an F1-score of 71% and 83% recall.<n>These findings demonstrate the substantial potential of integrating MLLMs with advanced visual analytics techniques.
arXiv Detail & Related papers (2025-09-23T14:47:33Z)
YOLO-APD: Enhancing YOLOv8 for Robust Pedestrian Detection on Complex Road Geometries [0.0]
This paper introduces YOLO-APD, a novel deep learning architecture enhancing the YOLOv8 framework specifically for this challenge.<n>YOLO-APD achieves state-of-the-art detection accuracy, reaching 77.7% mAP@0.5:0.95 and exceptional pedestrian recall exceeding 96%.<n>It maintains real-time processing capabilities at 100 FPS, showcasing a superior balance between accuracy and efficiency.
arXiv Detail & Related papers (2025-07-07T18:03:40Z)
World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks [53.98633183204453]
In this paper, a novel world model-based learning framework is proposed to minimize packet-completeness-aware age of information (CAoI) in a vehicular network.<n>A world model framework is proposed to jointly learn a dynamic model of the mmWave V2X environment and use it to imagine trajectories for learning how to perform link scheduling.<n>In particular, the long-term policy is learned in differentiable imagined trajectories instead of environment interactions.
arXiv Detail & Related papers (2025-05-03T06:23:18Z)
VAE-based Feature Disentanglement for Data Augmentation and Compression in Generalized GNSS Interference Classification [42.14439854721613]
We propose variational autoencoders (VAEs) for disentanglement to extract essential latent features that enable accurate classification of interferences.<n>Our proposed VAE achieves a data compression rate ranging from 512 to 8,192 and achieves an accuracy up to 99.92%.
arXiv Detail & Related papers (2025-04-14T13:38:00Z)
YOLO-LLTS: Real-Time Low-Light Traffic Sign Detection via Prior-Guided Enhancement and Multi-Branch Feature Interaction [45.79993863157494]
YOLO-LLTS is an end-to-end real-time traffic sign detection algorithm specifically designed for low-light environments.<n>YOLO-LLTS introduces three main contributions: the High-Resolution Feature Map for Small Object Detection (HRFM-SOD), the Multi-branch Feature Interaction Attention (MFIA) and the Prior-Guided Feature Enhancement Module (PGFE)<n>Experiments show that YOLO-LLTS achieves state-of-the-art performance, outperforming previous best methods by 2.7% mAP50 and 1.6% mAP50:95 on TT100K-night.
arXiv Detail & Related papers (2025-03-18T04:28:05Z)
AutoML for Multi-Class Anomaly Compensation of Sensor Drift [44.63945828405864]
Sensor drift degrades the performance of machine learning models over time.<n>Standard cross-validation method overestimates performance by inadequately accounting for drift.<n>This paper presents two solutions: (1) a novel sensor drift compensation learning paradigm for validating models, and (2) automated machine learning (AutoML) techniques to enhance classification performance and compensate sensor drift.
arXiv Detail & Related papers (2025-02-26T14:34:53Z)
Fast-COS: A Fast One-Stage Object Detector Based on Reparameterized Attention Vision Transformer for Autonomous Driving [3.617580194719686]
This paper introduces Fast-COS, a novel single-stage object detection framework crafted specifically for driving scenes.<n> RAViT achieves 81.4% Top-1 accuracy on the ImageNet-1K dataset.<n>It surpasses leading models in efficiency, delivering up to 75.9% faster GPU inference and 1.38 higher throughput on edge devices.
arXiv Detail & Related papers (2025-02-11T09:54:09Z)
Interpretable Dynamic Graph Neural Networks for Small Occluded Object Detection and Tracking [0.0]
This paper introduces DGNN-YOLO, a novel framework that integrates dynamic graph neural networks (DGNNs) with YOLO11 to address limitations.<n>Unlike standard GNNs, DGNNs are chosen for their superior ability to dynamically update graph structures in real-time.<n>This framework constructs and regularly updates its graph representations, capturing objects as nodes and their interactions as edges.
arXiv Detail & Related papers (2024-11-26T09:29:27Z)
Federated Learning framework for LoRaWAN-enabled IIoT communication: A case study [41.831392507864415]
Anomaly detection plays a crucial role in preventive maintenance and spotting irregularities in industrial components. Traditional Machine Learning faces challenges in deploying anomaly detection models in resource-constrained environments like LoRaWAN. Federated Learning (FL) solves this problem by enabling distributed model training, addressing privacy concerns, and minimizing data transmission.
arXiv Detail & Related papers (2024-10-15T13:48:04Z)
A Computer Vision Enabled damage detection model with improved YOLOv5 based on Transformer Prediction Head [0.0]
Current state-of-the-art deep learning (DL)-based damage detection models often lack superior feature extraction capability in complex and noisy environments. DenseSPH-YOLOv5 is a real-time DL-based high-performance damage detection model where DenseNet blocks have been integrated with the backbone. DenseSPH-YOLOv5 obtains a mean average precision (mAP) value of 85.25 %, F1-score of 81.18 %, and precision (P) value of 89.51 % outperforming current state-of-the-art models.
arXiv Detail & Related papers (2023-03-07T22:53:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.