Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a
Large Foundational Video Understanding Model
- URL: http://arxiv.org/abs/2401.16280v1
- Date: Mon, 29 Jan 2024 16:37:00 GMT
- Title: Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a
Large Foundational Video Understanding Model
- Authors: Till Grutschus, Ola Karrar, Emir Esenov and Ekta Vats
- Abstract summary: This work explores the performance of a large video understanding foundation model on the downstream task of human fall detection on untrimmed video.
A method for temporal action localization that relies on a simple cutup of untrimmed videos is demonstrated.
The results are promising for real-time application, and the falls are detected on video level with a state-of-the-art 0.96 F1 score on the HQFSD dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work explores the performance of a large video understanding foundation
model on the downstream task of human fall detection on untrimmed video and
leverages a pretrained vision transformer for multi-class action detection,
with classes: "Fall", "Lying" and "Other/Activities of daily living (ADL)". A
method for temporal action localization that relies on a simple cutup of
untrimmed videos is demonstrated. The methodology includes a preprocessing
pipeline that converts datasets with timestamp action annotations into labeled
datasets of short action clips. Simple and effective clip-sampling strategies
are introduced. The effectiveness of the proposed method has been empirically
evaluated on the publicly available High-Quality Fall Simulation Dataset
(HQFSD). The experimental results validate the performance of the proposed
pipeline. The results are promising for real-time application, and the falls
are detected on video level with a state-of-the-art 0.96 F1 score on the HQFSD
dataset under the given experimental settings. The source code will be made
available on GitHub.
Related papers
- HAVANA: Hierarchical stochastic neighbor embedding for Accelerated Video ANnotAtions [59.71751978599567]
This paper presents a novel annotation pipeline that uses pre-extracted features and dimensionality reduction to accelerate the temporal video annotation process.
We demonstrate significant improvements in annotation effort compared to traditional linear methods, achieving more than a 10x reduction in clicks required for annotating over 12 hours of video.
arXiv Detail & Related papers (2024-09-16T18:15:38Z) - Practical Video Object Detection via Feature Selection and Aggregation [18.15061460125668]
Video object detection (VOD) needs to concern the high across-frame variation in object appearance, and the diverse deterioration in some frames.
Most of contemporary aggregation methods are tailored for two-stage detectors, suffering from high computational costs.
This study invents a very simple yet potent strategy of feature selection and aggregation, gaining significant accuracy at marginal computational expense.
arXiv Detail & Related papers (2024-07-29T02:12:11Z) - DeCoF: Generated Video Detection via Frame Consistency: The First Benchmark Dataset [32.236653072212015]
We propose an open-source dataset and a detection method for generated video for the first time.
First, we propose a scalable dataset consisting of 964 prompts, covering various forgery targets, scenes, behaviors, and actions.
Second, we found via probing experiments that spatial artifact-based detectors lack generalizability.
arXiv Detail & Related papers (2024-02-03T08:52:06Z) - SOAR: Scene-debiasing Open-set Action Recognition [81.8198917049666]
We propose Scene-debiasing Open-set Action Recognition (SOAR), which features an adversarial scene reconstruction module and an adaptive adversarial scene classification module.
The former prevents the decoder from reconstructing the video background given video features, and thus helps reduce the background information in feature learning.
The latter aims to confuse scene type classification given video features, with a specific emphasis on the action foreground, and helps to learn scene-invariant information.
arXiv Detail & Related papers (2023-09-03T20:20:48Z) - Unsupervised Video Anomaly Detection with Diffusion Models Conditioned
on Compact Motion Representations [17.816344808780965]
unsupervised video anomaly detection (VAD) problem involves classifying each frame in a video as normal or abnormal, without any access to labels.
To accomplish this, proposed method employs conditional diffusion models, where the input data is features extracted from pre-trained network.
Our method utilizes a data-driven threshold and considers a high reconstruction error as an indicator of anomalous events.
arXiv Detail & Related papers (2023-07-04T07:36:48Z) - Glitch in the Matrix: A Large Scale Benchmark for Content Driven
Audio-Visual Forgery Detection and Localization [20.46053083071752]
We propose and benchmark a new dataset, Localized Visual DeepFake (LAV-DF)
LAV-DF consists of strategic content-driven audio, visual and audio-visual manipulations.
The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture.
arXiv Detail & Related papers (2023-05-03T08:48:45Z) - Anomaly detection in surveillance videos using transformer based
attention model [3.2968779106235586]
This research suggests using a weakly supervised strategy to avoid annotating anomalous segments in training videos.
The proposed framework is validated on real-world dataset i.e. ShanghaiTech Campus dataset.
arXiv Detail & Related papers (2022-06-03T12:19:39Z) - Less is More: ClipBERT for Video-and-Language Learning via Sparse
Sampling [98.41300980759577]
A canonical approach to video-and-language learning dictates a neural model to learn from offline-extracted dense video features.
We propose a generic framework ClipBERT that enables affordable end-to-end learning for video-and-language tasks.
Experiments on text-to-video retrieval and video question answering on six datasets demonstrate that ClipBERT outperforms existing methods.
arXiv Detail & Related papers (2021-02-11T18:50:16Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z) - Gabriella: An Online System for Real-Time Activity Detection in
Untrimmed Security Videos [72.50607929306058]
We propose a real-time online system to perform activity detection on untrimmed security videos.
The proposed method consists of three stages: tubelet extraction, activity classification and online tubelet merging.
We demonstrate the effectiveness of the proposed approach in terms of speed (100 fps) and performance with state-of-the-art results.
arXiv Detail & Related papers (2020-04-23T22:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.