Evaluating Temporal Queries Over Video Feeds
- URL: http://arxiv.org/abs/2003.00953v3
- Date: Thu, 5 Mar 2020 22:22:46 GMT
- Title: Evaluating Temporal Queries Over Video Feeds
- Authors: Yueting Chen and Xiaohui Yu and Nick Koudas
- Abstract summary: Temporal queries involving objects and their co-occurrences in video feeds are of interest to many applications ranging from law enforcement to security and safety.
We present an architecture consisting of three layers, namely object detection/tracking, intermediate data generation and query evaluation.
We propose two techniques,MFS and SSG, to organize all detected objects in the intermediate data generation layer.
We also introduce an algorithm called State Traversal (ST) that processes incoming frames against the SSG and efficiently prunes objects and frames unrelated to query evaluation.
- Score: 25.04363138106074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in Computer Vision and Deep Learning made possible the
efficient extraction of a schema from frames of streaming video. As such, a
stream of objects and their associated classes along with unique object
identifiers derived via object tracking can be generated, providing unique
objects as they are captured across frames. In this paper we initiate a study
of temporal queries involving objects and their co-occurrences in video feeds.
For example, queries that identify video segments during which the same two red
cars and the same two humans appear jointly for five minutes are of interest to
many applications ranging from law enforcement to security and safety. We take
the first step and define such queries in a way that they incorporate certain
physical aspects of video capture such as object occlusion. We present an
architecture consisting of three layers, namely object detection/tracking,
intermediate data generation and query evaluation. We propose two
techniques,MFS and SSG, to organize all detected objects in the intermediate
data generation layer, which effectively, given the queries, minimizes the
number of objects and frames that have to be considered during query
evaluation. We also introduce an algorithm called State Traversal (ST) that
processes incoming frames against the SSG and efficiently prunes objects and
frames unrelated to query evaluation, while maintaining all states required for
succinct query evaluation. We present the results of a thorough experimental
evaluation utilizing both real and synthetic data establishing the trade-offs
between MFS and SSG. We stress various parameters of interest in our evaluation
and demonstrate that the proposed query evaluation methodology coupled with the
proposed algorithms is capable to evaluate temporal queries over video feeds
efficiently, achieving orders of magnitude performance benefits.
Related papers
- Spatial-Temporal Multi-level Association for Video Object Segmentation [89.32226483171047]
This paper proposes spatial-temporal multi-level association, which jointly associates reference frame, test frame, and object features.
Specifically, we construct a spatial-temporal multi-level feature association module to learn better target-aware features.
arXiv Detail & Related papers (2024-04-09T12:44:34Z) - Temporal-aware Hierarchical Mask Classification for Video Semantic
Segmentation [62.275143240798236]
Video semantic segmentation dataset has limited categories per video.
Less than 10% of queries could be matched to receive meaningful gradient updates during VSS training.
Our method achieves state-of-the-art performance on the latest challenging VSS benchmark VSPW without bells and whistles.
arXiv Detail & Related papers (2023-09-14T20:31:06Z) - Temporal Saliency Query Network for Efficient Video Recognition [82.52760040577864]
Video recognition is a hot-spot research topic with the explosive growth of multimedia data on the Internet and mobile devices.
Most existing methods select the salient frames without awareness of the class-specific saliency scores.
We propose a novel Temporal Saliency Query (TSQ) mechanism, which introduces class-specific information to provide fine-grained cues for saliency measurement.
arXiv Detail & Related papers (2022-07-21T09:23:34Z) - A Unified Transformer Framework for Group-based Segmentation:
Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection [59.21990697929617]
Humans tend to mine objects by learning from a group of images or several frames of video since we live in a dynamic world.
Previous approaches design different networks on similar tasks separately, and they are difficult to apply to each other.
We introduce a unified framework to tackle these issues, term as UFO (UnifiedObject Framework for Co-Object Framework)
arXiv Detail & Related papers (2022-03-09T13:35:19Z) - Temporal Query Networks for Fine-grained Video Understanding [88.9877174286279]
We cast this into a query-response mechanism, where each query addresses a particular question, and has its own response label set.
We evaluate the method extensively on the FineGym and Diving48 benchmarks for fine-grained action classification and surpass the state-of-the-art using only RGB features.
arXiv Detail & Related papers (2021-04-19T17:58:48Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z) - Learning from Counting: Leveraging Temporal Classification for Weakly
Supervised Object Localization and Detection [4.971083368517706]
We introduce scan-order techniques to serialize 2D images into 1D sequence data.
We then leverage a combined LSTM (Long, Short-Term Memory) and CTC network to achieve object localization.
arXiv Detail & Related papers (2021-03-06T02:18:03Z) - Video Monitoring Queries [16.7214343633499]
We study the problem of interactive declarative query processing on video streams.
We introduce a set of approximate filters to speed up queries that involve objects of specific type.
The filters are able to assess quickly if the query predicates are true to proceed with further analysis of the frame.
arXiv Detail & Related papers (2020-02-24T20:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.