Related papers: Knowledge Graph Driven Approach to Represent Video Streams for Spatiotemporal Event Pattern Matching in Complex Event Processing

Knowledge Graph Driven Approach to Represent Video Streams for Spatiotemporal Event Pattern Matching in Complex Event Processing

URL: http://arxiv.org/abs/2007.06292v1
Date: Mon, 13 Jul 2020 10:20:58 GMT
Title: Knowledge Graph Driven Approach to Represent Video Streams for Spatiotemporal Event Pattern Matching in Complex Event Processing
Authors: Piyush Yadav, Dhaval Salwala, Edward Curry
Abstract summary: Complex Event Processing (CEP) is an event processing paradigm to perform real-time analytics over streaming data. Video streams are complicated due to their unstructured data model and limit CEP systems to perform matching over them. This work introduces a graph-based structure for continuous evolving video streams, which enables the CEP system to query complex video event patterns.
Score: 5.220940151628734
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Complex Event Processing (CEP) is an event processing paradigm to perform real-time analytics over streaming data and match high-level event patterns. Presently, CEP is limited to process structured data stream. Video streams are complicated due to their unstructured data model and limit CEP systems to perform matching over them. This work introduces a graph-based structure for continuous evolving video streams, which enables the CEP system to query complex video event patterns. We propose the Video Event Knowledge Graph (VEKG), a graph driven representation of video data. VEKG models video objects as nodes and their relationship interaction as edges over time and space. It creates a semantic knowledge representation of video data derived from the detection of high-level semantic concepts from the video using an ensemble of deep learning models. A CEP-based state optimization - VEKG-Time Aggregated Graph (VEKG-TAG) is proposed over VEKG representation for faster event detection. VEKG-TAG is a spatiotemporal graph aggregation method that provides a summarized view of the VEKG graph over a given time length. We defined a set of nine event pattern rules for two domains (Activity Recognition and Traffic Management), which act as a query and applied over VEKG graphs to discover complex event patterns. To show the efficacy of our approach, we performed extensive experiments over 801 video clips across 10 datasets. The proposed VEKG approach was compared with other state-of-the-art methods and was able to detect complex event patterns over videos with F-Score ranging from 0.44 to 0.90. In the given experiments, the optimized VEKG-TAG was able to reduce 99% and 93% of VEKG nodes and edges, respectively, with 5.19X faster search time, achieving sub-second median latency of 4-20 milliseconds.

Related papers

THYME: Temporal Hierarchical-Cyclic Interactivity Modeling for Video Scene Graphs in Aerial Footage [11.587822611656648]
We introduce the Temporal Hierarchical Cyclic Scene Graph (THYME) approach, which integrates hierarchical feature aggregation with cyclic temporal refinement to address limitations.<n>THYME effectively models multi-scale spatial context and enforces temporal consistency across frames, yielding more accurate and coherent scene graphs.<n>In addition, we present AeroEye-v1.0, a novel aerial video dataset enriched with five types of interactivity that overcomes the constraints of existing datasets.
arXiv Detail & Related papers (2025-07-12T08:43:38Z)
Understanding Long Videos via LLM-Powered Entity Relation Graphs [51.13422967711056]
GraphVideoAgent is a framework that maps and monitors the evolving relationships between visual entities throughout the video sequence. Our approach demonstrates remarkable effectiveness when tested against industry benchmarks.
arXiv Detail & Related papers (2025-01-27T10:57:24Z)
Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role Labeling [96.64607294592062]
Video Semantic Label Roleing (VidSRL) aims to detect salient events from given videos. Recent endeavors have put forth methods for VidSRL, but they can be subject to two key drawbacks.
arXiv Detail & Related papers (2023-08-09T17:20:14Z)
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement [64.11385310305612]
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
arXiv Detail & Related papers (2023-06-14T17:07:51Z)
Visually-aware Acoustic Event Detection using Heterogeneous Graphs [39.90352230010103]
Perception of auditory events is inherently multimodal relying on both audio and visual cues. We employ heterogeneous graphs to capture the spatial and temporal relationships between the modalities. We show efficiently modelling of intra- and inter-modality relationships both at spatial and temporal scales.
arXiv Detail & Related papers (2022-07-16T13:09:25Z)
End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection [31.31508043234419]
We propose a new end-to-end compressed video representation learning for event boundary detection. We first use the ConvNets to extract features of the I-frames in the GOPs. After that, a light-weight spatial-channel compressed encoder is designed to compute the feature representations of the P-frames. A temporal contrastive module is proposed to determine the event boundaries of video sequences.
arXiv Detail & Related papers (2022-03-29T08:27:48Z)
Representing Videos as Discriminative Sub-graphs for Action Recognition [165.54738402505194]
We introduce a new design of sub-graphs to represent and encode theriminative patterns of each action in the videos. We present MUlti-scale Sub-Earn Ling (MUSLE) framework that novelly builds space-time graphs and clusters into compact sub-graphs on each scale.
arXiv Detail & Related papers (2022-01-11T16:15:25Z)
TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning [79.77010271213695]
We propose a novel video self-supervised learning framework named Temporal Contrastive Graph Learning (TCGL) Our TCGL integrates the prior knowledge about the frame and snippet orders into graph structures, i.e., the intra-/inter- snippet Temporal Contrastive Graphs (TCG) To generate supervisory signals for unlabeled videos, we introduce an Adaptive Snippet Order Prediction (ASOP) module.
arXiv Detail & Related papers (2021-12-07T09:27:56Z)
Video Is Graph: Structured Graph Module for Video Action Recognition [34.918667614077805]
We transform a video sequence into a graph to obtain direct long-term dependencies among temporal frames. In particular, SGM divides the neighbors of each node into several temporal regions so as to extract global structural information. The reported performance and analysis demonstrate that SGM can achieve outstanding precision with less computational complexity.
arXiv Detail & Related papers (2021-10-12T11:27:29Z)
Target Adaptive Context Aggregation for Video Scene Graph Generation [36.669700084337045]
This paper deals with a challenging task of video scene graph generation (VidSGG) We present a new em detect-to-track paradigm for this task by decoupling the context modeling for relation prediction from the complicated low-level entity tracking.
arXiv Detail & Related papers (2021-08-18T12:46:28Z)
Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification [110.52328716130022]
Video-based person re-identification (re-ID) is an important research topic in computer vision. We propose a novel graph-based framework, namely Multi-Granular Hypergraph (MGH) to better representational capabilities. 90.0% top-1 accuracy on MARS is achieved using MGH, outperforming the state-of-the-arts schemes.
arXiv Detail & Related papers (2021-04-30T11:20:02Z)
VidCEP: Complex Event Processing Framework to Detect Spatiotemporal Patterns in Video Streams [5.53329677986653]
Middleware systems such as Complex Event Processing (CEP) mine patterns from data streams and send notifications to users in a timely fashion. Current CEP systems have inherent limitations to query video streams due to their unstructured data model and expressive query language. We propose VidCEP, an in-memory, near real-time complex event matching framework for video streams.
arXiv Detail & Related papers (2020-07-15T16:43:37Z)
Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks [150.5425122989146]
This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS) AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges. Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case.
arXiv Detail & Related papers (2020-01-19T10:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.