A Low-Computational Video Synopsis Framework with a Standard Dataset
- URL: http://arxiv.org/abs/2409.05230v1
- Date: Sun, 8 Sep 2024 22:08:36 GMT
- Title: A Low-Computational Video Synopsis Framework with a Standard Dataset
- Authors: Ramtin Malekpour, M. Mehrdad Morsali, Hoda Mohammadzade,
- Abstract summary: Video synopsis is an efficient method for condensing surveillance videos.
The lack of a standard dataset for the video synopsis task hinders the comparison of different video synopsis models.
This paper introduces a video synopsis model, called FGS, with low computational cost.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Video synopsis is an efficient method for condensing surveillance videos. This technique begins with the detection and tracking of objects, followed by the creation of object tubes. These tubes consist of sequences, each containing chronologically ordered bounding boxes of a unique object. To generate a condensed video, the first step involves rearranging the object tubes to maximize the number of non-overlapping objects in each frame. Then, these tubes are stitched to a background image extracted from the source video. The lack of a standard dataset for the video synopsis task hinders the comparison of different video synopsis models. This paper addresses this issue by introducing a standard dataset, called SynoClip, designed specifically for the video synopsis task. SynoClip includes all the necessary features needed to evaluate various models directly and effectively. Additionally, this work introduces a video synopsis model, called FGS, with low computational cost. The model includes an empty-frame object detector to identify frames empty of any objects, facilitating efficient utilization of the deep object detector. Moreover, a tube grouping algorithm is proposed to maintain relationships among tubes in the synthesized video. This is followed by a greedy tube rearrangement algorithm, which efficiently determines the start time of each tube. Finally, the proposed model is evaluated using the proposed dataset. The source code, fine-tuned object detection model, and tutorials are available at https://github.com/Ramtin-ma/VideoSynopsis-FGS.
Related papers
- Manipulating a Tetris-Inspired 3D Video Representation [0.0]
Video algorithm is a technique that performs video compression in a way that preserves the activity in the video.
We discuss different object-temporal data representations suitable for different applications.
We explore the application of a packing algorithm to solve the problem of video synopsis.
arXiv Detail & Related papers (2024-07-11T22:41:14Z) - UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos [52.161513027831646]
We focus on a more comprehensive video summarization task named Bimodal Semantic Summarization of Videos (BiSSV)
We propose a Unified framework UBiSS for the BiSSV task, which models the saliency information in the video and generates a TM-summary and VM-summary simultaneously.
Experiments show that our unified framework achieves better performance than multi-stage summarization pipelines.
arXiv Detail & Related papers (2024-06-24T03:55:25Z) - VideoSAGE: Video Summarization with Graph Representation Learning [9.21019970479227]
We propose a graph-based representation learning framework for video summarization.
A graph constructed this way aims to capture long-range interactions among video frames, and the sparsity ensures the model trains without hitting the memory and compute bottleneck.
arXiv Detail & Related papers (2024-04-14T15:49:02Z) - Moving Object Based Collision-Free Video Synopsis [1.55172825097051]
Video synopsis generates a shorter video by exploiting the spatial and temporal redundancies.
We propose a real-time algorithm by using a method that incrementally stitches each frame of the synopsis.
Experiments with six common test videos, indoors and outdoors, show that the proposed video synopsis algorithm produces better frame reduction rates than existing approaches.
arXiv Detail & Related papers (2023-09-17T16:49:42Z) - VideoXum: Cross-modal Visual and Textural Summarization of Videos [54.0985975755278]
We propose a new joint video and text summarization task.
The goal is to generate both a shortened video clip along with the corresponding textual summary from a long video.
The generated shortened video clip and text narratives should be semantically well aligned.
arXiv Detail & Related papers (2023-03-21T17:51:23Z) - WALDO: Future Video Synthesis using Object Layer Decomposition and
Parametric Flow Prediction [82.79642869586587]
WALDO is a novel approach to the prediction of future video frames from past ones.
Individual images are decomposed into multiple layers combining object masks and a small set of control points.
The layer structure is shared across all frames in each video to build dense inter-frame connections.
arXiv Detail & Related papers (2022-11-25T18:59:46Z) - TL;DW? Summarizing Instructional Videos with Task Relevance &
Cross-Modal Saliency [133.75876535332003]
We focus on summarizing instructional videos, an under-explored area of video summarization.
Existing video summarization datasets rely on manual frame-level annotations.
We propose an instructional video summarization network that combines a context-aware temporal video encoder and a segment scoring transformer.
arXiv Detail & Related papers (2022-08-14T04:07:40Z) - Video Demoireing with Relation-Based Temporal Consistency [68.20281109859998]
Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras.
We study how to remove such undesirable moire patterns in videos, namely video demoireing.
arXiv Detail & Related papers (2022-04-06T17:45:38Z) - A new Video Synopsis Based Approach Using Stereo Camera [0.5801044612920815]
A new method for anomaly detection with object-based unsupervised learning has been developed.
By using this method, the video data is processed as pixels and the result is produced as a video segment.
The model we developed has been tested and verified separately for single camera and dual camera systems.
arXiv Detail & Related papers (2021-06-23T12:57:47Z) - Few-Shot Video Object Detection [70.43402912344327]
We introduce Few-Shot Video Object Detection (FSVOD) with three important contributions.
FSVOD-500 comprises of 500 classes with class-balanced videos in each category for few-shot learning.
Our TPN and TMN+ are jointly and end-to-end trained.
arXiv Detail & Related papers (2021-04-30T07:38:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.