VidCEP: Complex Event Processing Framework to Detect Spatiotemporal
Patterns in Video Streams
- URL: http://arxiv.org/abs/2007.07817v1
- Date: Wed, 15 Jul 2020 16:43:37 GMT
- Title: VidCEP: Complex Event Processing Framework to Detect Spatiotemporal
Patterns in Video Streams
- Authors: Piyush Yadav, Edward Curry
- Abstract summary: Middleware systems such as Complex Event Processing (CEP) mine patterns from data streams and send notifications to users in a timely fashion.
Current CEP systems have inherent limitations to query video streams due to their unstructured data model and expressive query language.
We propose VidCEP, an in-memory, near real-time complex event matching framework for video streams.
- Score: 5.53329677986653
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video data is highly expressive and has traditionally been very difficult for
a machine to interpret. Querying event patterns from video streams is
challenging due to its unstructured representation. Middleware systems such as
Complex Event Processing (CEP) mine patterns from data streams and send
notifications to users in a timely fashion. Current CEP systems have inherent
limitations to query video streams due to their unstructured data model and
lack of expressive query language. In this work, we focus on a CEP framework
where users can define high-level expressive queries over videos to detect a
range of spatiotemporal event patterns. In this context, we propose: i) VidCEP,
an in-memory, on the fly, near real-time complex event matching framework for
video streams. The system uses a graph-based event representation for video
streams which enables the detection of high-level semantic concepts from video
using cascades of Deep Neural Network models, ii) a Video Event Query language
(VEQL) to express high-level user queries for video streams in CEP, iii) a
complex event matcher to detect spatiotemporal video event patterns by matching
expressive user queries over video data. The proposed approach detects
spatiotemporal video event patterns with an F-score ranging from 0.66 to 0.89.
VidCEP maintains near real-time performance with an average throughput of 70
frames per second for 5 parallel videos with sub-second matching latency.
Related papers
- Event-aware Video Corpus Moment Retrieval [79.48249428428802]
Video Corpus Moment Retrieval (VCMR) is a practical video retrieval task focused on identifying a specific moment within a vast corpus of untrimmed videos.
Existing methods for VCMR typically rely on frame-aware video retrieval, calculating similarities between the query and video frames to rank videos.
We propose EventFormer, a model that explicitly utilizes events within videos as fundamental units for video retrieval.
arXiv Detail & Related papers (2024-02-21T06:55:20Z) - Local Compressed Video Stream Learning for Generic Event Boundary
Detection [25.37983456118522]
Event boundary detection aims to localize the generic, taxonomy-free event boundaries that segment videos into chunks.
Existing methods typically require video frames to be decoded before feeding into the network.
We propose a novel event boundary detection method that is fully end-to-end leveraging rich information in the compressed domain.
arXiv Detail & Related papers (2023-09-27T06:49:40Z) - Towards Video Anomaly Retrieval from Video Anomaly Detection: New
Benchmarks and Model [70.97446870672069]
Video anomaly detection (VAD) has been paid increasing attention due to its potential applications.
Video Anomaly Retrieval ( VAR) aims to pragmatically retrieve relevant anomalous videos by cross-modalities.
We present two benchmarks, UCFCrime-AR and XD-Violence, constructed on top of prevalent anomaly datasets.
arXiv Detail & Related papers (2023-07-24T06:22:37Z) - Query-Dependent Video Representation for Moment Retrieval and Highlight
Detection [8.74967598360817]
Key objective of MR/HD is to localize the moment and estimate clip-wise accordance level, i.e., saliency score, to a given text query.
Recent transformer-based models do not fully exploit the information of a given query.
We introduce Query-Dependent DETR (QD-DETR), a detection transformer tailored for MR/HD.
arXiv Detail & Related papers (2023-03-24T09:32:50Z) - Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval [20.493241098064665]
Video corpus moment retrieval (VCMR) is the task to retrieve the most relevant video moment from a large video corpus using a natural language query.
We propose a self-supervised learning framework: Modal-specific Pseudo Query Generation Network (MPGN)
MPGN generates pseudo queries exploiting both visual and textual information from selected temporal moments.
We show that MPGN successfully learns to localize the video corpus moment without any explicit annotation.
arXiv Detail & Related papers (2022-10-23T05:05:18Z) - QVHighlights: Detecting Moments and Highlights in Videos via Natural
Language Queries [89.24431389933703]
We present the Query-based Video Highlights (QVHighlights) dataset.
It consists of over 10,000 YouTube videos, covering a wide range of topics.
Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w.r.t. the query, and (3) five-point scale saliency scores for all query-relevant clips.
arXiv Detail & Related papers (2021-07-20T16:42:58Z) - DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video
Summarization [127.16984421969529]
We introduce a novel Query-Aware Hierarchical Pointer Network for Multi-Video Summarization, termed DeepQAMVS.
DeepQAMVS is trained with reinforcement learning, incorporating rewards that capture representativeness, diversity, query-adaptability and temporal coherence.
We achieve state-of-the-art results on the MVS1K dataset, with inference time scaling linearly with the number of input video frames.
arXiv Detail & Related papers (2021-05-13T17:33:26Z) - Video Corpus Moment Retrieval with Contrastive Learning [56.249924768243375]
Video corpus moment retrieval (VCMR) is to retrieve a temporal moment that semantically corresponds to a given text query.
We propose a Retrieval and Localization Network with Contrastive Learning (ReLoCLNet) for VCMR.
Experimental results show that ReLoCLNet encodes text and video separately for efficiency, its retrieval accuracy is comparable with baselines adopting cross-modal interaction learning.
arXiv Detail & Related papers (2021-05-13T12:54:39Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z) - Knowledge Graph Driven Approach to Represent Video Streams for
Spatiotemporal Event Pattern Matching in Complex Event Processing [5.220940151628734]
Complex Event Processing (CEP) is an event processing paradigm to perform real-time analytics over streaming data.
Video streams are complicated due to their unstructured data model and limit CEP systems to perform matching over them.
This work introduces a graph-based structure for continuous evolving video streams, which enables the CEP system to query complex video event patterns.
arXiv Detail & Related papers (2020-07-13T10:20:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.