Estimation of Psychosocial Work Environment Exposures Through Video Object Detection. Proof of Concept Using CCTV Footage
- URL: http://arxiv.org/abs/2411.03724v1
- Date: Wed, 06 Nov 2024 07:43:40 GMT
- Title: Estimation of Psychosocial Work Environment Exposures Through Video Object Detection. Proof of Concept Using CCTV Footage
- Authors: Claus D. Hansen, Thuy Hai Le, David Campos,
- Abstract summary: This paper examines the use of computer vision algorithms to estimate aspects of the psychosocial work environment using CCTV footage.
We present a proof of concept for a methodology that detects and tracks people in video footage.
We estimate interactions between customers and employees by estimating their poses and calculating the duration of their encounters.
- Score: 0.6632353937719806
- License:
- Abstract: This paper examines the use of computer vision algorithms to estimate aspects of the psychosocial work environment using CCTV footage. We present a proof of concept for a methodology that detects and tracks people in video footage and estimates interactions between customers and employees by estimating their poses and calculating the duration of their encounters. We propose a pipeline that combines existing object detection and tracking algorithms (YOLOv8 and DeepSORT) with pose estimation algorithms (BlazePose) to estimate the number of customers and employees in the footage as well as the duration of their encounters. We use a simple rule-based approach to classify the interactions as positive, neutral or negative based on three different criteria: distance, duration and pose. The proposed methodology is tested on a small dataset of CCTV footage. While the data is quite limited in particular with respect to the quality of the footage, we have chosen this case as it represents a typical setting where the method could be applied. The results show that the object detection and tracking part of the pipeline has a reasonable performance on the dataset with a high degree of recall and reasonable accuracy. At this stage, the pose estimation is still limited to fully detect the type of interactions due to difficulties in tracking employees in the footage. We conclude that the method is a promising alternative to self-reported measures of the psychosocial work environment and could be used in future studies to obtain external observations of the work environment.
Related papers
- Strike the Balance: On-the-Fly Uncertainty based User Interactions for Long-Term Video Object Segmentation [3.3088334148160725]
We introduce a variant of video object segmentation (VOS) that bridges interactive and semi-automatic approaches.
We aim to maximize the tracking duration of an object of interest, while requiring minimal user corrections to maintain tracking over an extended period.
We evaluate our approach using the recently introduced LVOS dataset, which offers numerous long-term videos.
arXiv Detail & Related papers (2024-07-31T21:42:42Z) - Practical Video Object Detection via Feature Selection and Aggregation [18.15061460125668]
Video object detection (VOD) needs to concern the high across-frame variation in object appearance, and the diverse deterioration in some frames.
Most of contemporary aggregation methods are tailored for two-stage detectors, suffering from high computational costs.
This study invents a very simple yet potent strategy of feature selection and aggregation, gaining significant accuracy at marginal computational expense.
arXiv Detail & Related papers (2024-07-29T02:12:11Z) - Deep Learning-Based Object Pose Estimation: A Comprehensive Survey [73.74933379151419]
We discuss the recent advances in deep learning-based object pose estimation.
Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks.
arXiv Detail & Related papers (2024-05-13T14:44:22Z) - Assisting Blind People Using Object Detection with Vocal Feedback [0.0]
The proposed approach suggests detection of objects in real-time video by using a web camera.
The OpenCV libraries of Python is used to implement the software program.
Image recognition results are transferred to the visually impaired users in audible form by means of Google text-to-speech library.
arXiv Detail & Related papers (2023-12-18T19:28:23Z) - Object-centric Video Representation for Long-term Action Anticipation [33.115854386196126]
Key motivation is that objects provide important cues to recognize and predict human-object interactions.
We propose to build object-centric video representations by leveraging visual-language pretrained models.
To recognize and predict human-object interactions, we use a Transformer-based neural architecture.
arXiv Detail & Related papers (2023-10-31T22:54:31Z) - End-to-end Evaluation of Practical Video Analytics Systems for Face
Detection and Recognition [9.942007083253479]
Video analytics systems are deployed in bandwidth constrained environments like autonomous vehicles.
In an end-to-end face analytics system, inputs are first compressed using popular video codecs like HEVC.
We demonstrate how independent task evaluations, dataset imbalances, and inconsistent annotations can lead to incorrect system performance estimates.
arXiv Detail & Related papers (2023-10-10T19:06:10Z) - Video Action Detection: Analysing Limitations and Challenges [70.01260415234127]
We analyze existing datasets on video action detection and discuss their limitations.
We perform a biasness study which analyzes a key property differentiating videos from static images: the temporal aspect.
Such extreme experiments show existence of biases which have managed to creep into existing methods inspite of careful modeling.
arXiv Detail & Related papers (2022-04-17T00:42:14Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Self-Attentive 3D Human Pose and Shape Estimation from Videos [82.63503361008607]
We present a video-based learning algorithm for 3D human pose and shape estimation.
We exploit temporal information in videos and propose a self-attention module.
We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets.
arXiv Detail & Related papers (2021-03-26T00:02:19Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z) - Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera.
We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network.
We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.