AutoOEP -- A Multi-modal Framework for Online Exam Proctoring
- URL: http://arxiv.org/abs/2509.10887v2
- Date: Wed, 24 Sep 2025 09:29:40 GMT
- Title: AutoOEP -- A Multi-modal Framework for Online Exam Proctoring
- Authors: Aryan Kashyap Naveen, Bhuvanesh Singla, Raajan Wankhade, Shreesha M, Ramu S, Ram Mohana Reddy Guddeti,
- Abstract summary: This paper introduces AutoOEP (Automated Online Exam Proctoring), a comprehensive, multi-modal framework that leverages computer vision and machine learning to provide effective, automated proctoring.<n>The system utilizes a dual-camera setup to capture both a frontal view of the examinee and a side view of the workspace, minimizing blind spots.<n>The Hand Module employs a fine-tuned YOLOv11 model for detecting prohibited items (e.g., mobile phones, notes) and tracks hand proximity to these objects.
- Score: 1.6522310568442877
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The burgeoning of online education has created an urgent need for robust and scalable systems to ensure academic integrity during remote examinations. Traditional human proctoring is often not feasible at scale, while existing automated solutions can be intrusive or fail to detect a wide range of cheating behaviors. This paper introduces AutoOEP (Automated Online Exam Proctoring), a comprehensive, multi-modal framework that leverages computer vision and machine learning to provide effective, automated proctoring. The system utilizes a dual-camera setup to capture both a frontal view of the examinee and a side view of the workspace, minimizing blind spots. Our approach integrates several parallel analyses: the Face Module performs continuous identity verification using ArcFace, along with head pose estimation, gaze tracking, and mouth movement analysis to detect suspicious cues. Concurrently, the Hand Module employs a fine-tuned YOLOv11 model for detecting prohibited items (e.g., mobile phones, notes) and tracks hand proximity to these objects. Features from these modules are aggregated and fed into a Long Short-Term Memory (LSTM) network that analyzes temporal patterns to calculate a real-time cheating probability score. We evaluate AutoOEP on a custom-collected dataset simulating diverse exam conditions. Our system achieves an accuracy of 90.7% in classifying suspicious activities. The object detection component obtains a mean Average Precision (mAP@.5) of 0.57 for prohibited items, and the entire framework processes video streams at approximately 2.4 frames per second without a GPU. The results demonstrate that AutoOEP is an effective and resource-efficient solution for automated proctoring, significantly reducing the need for human intervention and enhancing the integrity of online assessments. The code is public and can be accessed at https://github.com/05kashyap/AutoOEP.
Related papers
- Zero-shot HOI Detection with MLLM-based Detector-agnostic Interaction Recognition [71.5328300638085]
Zero-shot Human-object interaction (HOI) detection aims to locate humans and objects in images and recognize their interactions.<n>Existing methods, including two-stage methods, tightly couple interaction recognition with a specific detector.<n>We propose a decoupled framework that separates object detection from IR and leverages multi-modal large language models (MLLMs) for zero-shot IR.
arXiv Detail & Related papers (2026-02-16T19:01:31Z) - Real-Time Multi-Modal Embedded Vision Framework for Object Detection Facial Emotion Recognition and Biometric Identification on Low-Power Edge Platforms [0.44219509596259216]
We present a real-time multi-modal vision framework that integrates object detection, owner-specific face recognition, and emotion detection into a unified pipeline deployed on a Raspberry Pi 5 edge platform.<n>Our work demonstrates that context-aware scheduling is the key to unlocking complex multi-modal AI on cost-effective edge hardware.
arXiv Detail & Related papers (2026-01-17T09:06:47Z) - ViSTR-GP: Online Cyberattack Detection via Vision-to-State Tensor Regression and Gaussian Processes in Automated Robotic Operations [5.95097350945477]
Connected and automated factories face growing cybersecurity risks that can potentially cause interruptions and damages to physical operations.<n>Data-integrity attacks often involve sophisticated exploitation of vulnerabilities that enable an attacker to access and manipulate the operational data.<n>This paper develops an online detection framework, ViSTR-GP, that cross-checks encoder-reported measurements against a vision-based estimate from an overhead camera outside the controller's authority.
arXiv Detail & Related papers (2025-09-13T19:10:35Z) - On the Black-box Explainability of Object Detection Models for Safe and Trustworthy Industrial Applications [7.848637922112521]
We focus on model-agnostic explainability methods for object detection models and propose D-P, an extension of the Morphological Fragmental Perturbation Pyramid (P) technique to generate explanations.<n>We evaluate these methods on real-world industrial and robotic datasets, examining the influence of parameters such as the number of masks, model size, and image resolution on the quality of explanations.
arXiv Detail & Related papers (2024-10-28T13:28:05Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - Run-time Introspection of 2D Object Detection in Automated Driving
Systems Using Learning Representations [13.529124221397822]
We introduce a novel introspection solution for 2D object detection based on Deep Neural Networks (DNNs)
We implement several state-of-the-art (SOTA) introspection mechanisms for error detection in 2D object detection, using one-stage and two-stage object detectors evaluated on KITTI and BDD datasets.
Our performance evaluation shows that the proposed introspection solution outperforms SOTA methods, achieving an absolute reduction in the missed error ratio of 9% to 17% in the BDD dataset.
arXiv Detail & Related papers (2024-03-02T10:56:14Z) - Follow Anything: Open-set detection, tracking, and following in
real-time [89.83421771766682]
We present a robotic system to detect, track, and follow any object in real-time.
Our approach, dubbed follow anything'' (FAn), is an open-vocabulary and multimodal model.
FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second.
arXiv Detail & Related papers (2023-08-10T17:57:06Z) - Identification of Driver Phone Usage Violations via State-of-the-Art
Object Detection with Tracking [8.147652597876862]
We propose a custom-trained state-of-the-art object detector to work with roadside cameras to capture driver phone usage without the need for human intervention.
The proposed approach also addresses the issues caused by windscreen glare and introduces the steps required to remedy this.
arXiv Detail & Related papers (2021-09-05T16:37:03Z) - Improving Variational Autoencoder based Out-of-Distribution Detection
for Embedded Real-time Applications [2.9327503320877457]
Out-of-distribution (OD) detection is an emerging approach to address the challenge of detecting out-of-distribution in real-time.
In this paper, we show how we can robustly detect hazardous motion around autonomous driving agents.
Our methods significantly improve detection capabilities of OoD factors to unique driving scenarios, 42% better than state-of-the-art approaches.
Our model also generalized near-perfectly, 97% better than the state-of-the-art across the real-world and simulation driving data sets experimented.
arXiv Detail & Related papers (2021-07-25T07:52:53Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Artificial Intelligence Enabled Traffic Monitoring System [3.085453921856008]
This article presents a novel approach to automatically monitor real time traffic footage using deep convolutional neural networks.
The proposed system deploys several state-of-the-art deep learning algorithms to automate different traffic monitoring needs.
arXiv Detail & Related papers (2020-10-02T22:28:02Z) - Towards Autonomous Driving: a Multi-Modal 360$^{\circ}$ Perception
Proposal [87.11988786121447]
This paper presents a framework for 3D object detection and tracking for autonomous vehicles.
The solution, based on a novel sensor fusion configuration, provides accurate and reliable road environment detection.
A variety of tests of the system, deployed in an autonomous vehicle, have successfully assessed the suitability of the proposed perception stack.
arXiv Detail & Related papers (2020-08-21T20:36:21Z) - Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.