Multi-Modal Video Forensic Platform for Investigating Post-Terrorist
Attack Scenarios
- URL: http://arxiv.org/abs/2004.01023v1
- Date: Thu, 2 Apr 2020 14:29:27 GMT
- Title: Multi-Modal Video Forensic Platform for Investigating Post-Terrorist
Attack Scenarios
- Authors: Alexander Schindler, Andrew Lindley, Anahid Jalali, Martin Boyer,
Sergiu Gordea, Ross King
- Abstract summary: Large scale Video Analytic Platforms (VAP) assist law enforcement agencies (LEA) in identifying suspects and securing evidence.
We present a video analytic platform that integrates visual and audio analytic modules and fuses information from surveillance cameras and video uploads from eyewitnesses.
- Score: 55.82693757287532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The forensic investigation of a terrorist attack poses a significant
challenge to the investigative authorities, as often several thousand hours of
video footage must be viewed. Large scale Video Analytic Platforms (VAP) assist
law enforcement agencies (LEA) in identifying suspects and securing evidence.
Current platforms focus primarily on the integration of different computer
vision methods and thus are restricted to a single modality. We present a video
analytic platform that integrates visual and audio analytic modules and fuses
information from surveillance cameras and video uploads from eyewitnesses.
Videos are analyzed according their acoustic and visual content. Specifically,
Audio Event Detection is applied to index the content according to
attack-specific acoustic concepts. Audio similarity search is utilized to
identify similar video sequences recorded from different perspectives. Visual
object detection and tracking are used to index the content according to
relevant concepts. Innovative user-interface concepts are introduced to harness
the full potential of the heterogeneous results of the analytical modules,
allowing investigators to more quickly follow-up on leads and eyewitness
reports.
Related papers
- Unsupervised Video Highlight Detection by Learning from Audio and Visual Recurrence [13.2968942989609]
We focus on unsupervised video highlight detection, eliminating the need for manual annotations.
Through a clustering technique, we identify pseudo-categories of videos and compute audio pseudo-highlight scores for each video.
We also compute visual pseudo-highlight scores for each video using visual features.
arXiv Detail & Related papers (2024-07-18T23:09:14Z) - AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting
Multiple Experts for Video Deepfake Detection [53.448283629898214]
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries.
Most previous work on detecting AI-generated fake videos only utilize visual modality or audio modality.
We propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation.
arXiv Detail & Related papers (2023-10-19T19:01:26Z) - CASP-Net: Rethinking Video Saliency Prediction from an
Audio-VisualConsistency Perceptual Perspective [30.995357472421404]
Video Saliency Prediction (VSP) imitates the selective attention mechanism of human brain.
Most VSP methods exploit semantic correlation between vision and audio modalities but ignore the negative effects due to the temporal inconsistency of audio-visual intrinsics.
Inspired by the biological inconsistency-correction within multi-sensory information, a consistency-aware audio-visual saliency prediction network (CASP-Net) is proposed.
arXiv Detail & Related papers (2023-03-11T09:29:57Z) - Self-Supervised Video Forensics by Audio-Visual Anomaly Detection [19.842795378751923]
Manipulated videos often contain subtle inconsistencies between their visual and audio signals.
We propose a video forensics method, based on anomaly detection, that can identify these inconsistencies.
We train an autoregressive model to generate sequences of audio-visual features, using feature sets that capture the temporal synchronization between video frames and sound.
arXiv Detail & Related papers (2023-01-04T18:59:49Z) - A Comprehensive Survey on Video Saliency Detection with Auditory
Information: the Audio-visual Consistency Perceptual is the Key! [25.436683033432086]
Video saliency detection (VSD) aims at fast locating the most attractive objects/things/patterns in a given video clip.
This paper provides extensive review to bridge the gap between audio-visual fusion and saliency detection.
arXiv Detail & Related papers (2022-06-20T07:25:13Z) - Audio-visual Representation Learning for Anomaly Events Detection in
Crowds [119.72951028190586]
This paper attempts to exploit multi-modal learning for modeling the audio and visual signals simultaneously.
We conduct the experiments on SHADE dataset, a synthetic audio-visual dataset in surveillance scenes.
We find introducing audio signals effectively improves the performance of anomaly events detection and outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-10-28T02:42:48Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z) - APES: Audiovisual Person Search in Untrimmed Video [87.4124877066541]
We present the Audiovisual Person Search dataset (APES)
APES contains over 1.9K identities labeled along 36 hours of video.
A key property of APES is that it includes dense temporal annotations that link faces to speech segments of the same identity.
arXiv Detail & Related papers (2021-06-03T08:16:42Z) - Audiovisual Highlight Detection in Videos [78.26206014711552]
We present results from two experiments: efficacy study of single features on the task, and an ablation study where we leave one feature out at a time.
For the video summarization task, our results indicate that the visual features carry most information, and including audiovisual features improves over visual-only information.
Results indicate that we can transfer knowledge from the video summarization task to a model trained specifically for the task of highlight detection.
arXiv Detail & Related papers (2021-02-11T02:24:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.