An Integrated System for Spatio-Temporal Summarization of 360-degrees
Videos
- URL: http://arxiv.org/abs/2312.02576v1
- Date: Tue, 5 Dec 2023 08:48:31 GMT
- Title: An Integrated System for Spatio-Temporal Summarization of 360-degrees
Videos
- Authors: Ioannis Kontostathis, Evlampios Apostolidis, Vasileios Mezaris
- Abstract summary: We present an integrated system for summarization of 360-degrees videos.
The video production mainly involves the detection of events and their synopsis into a concise summary.
The analysis relies on state-of-the-art methods for saliency detection in 360-degrees video.
- Score: 6.8292720972215974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present an integrated system for spatiotemporal
summarization of 360-degrees videos. The video summary production mainly
involves the detection of salient events and their synopsis into a concise
summary. The analysis relies on state-of-the-art methods for saliency detection
in 360-degrees video (ATSal and SST-Sal) and video summarization (CA-SUM). It
also contains a mechanism that classifies a 360-degrees video based on the use
of static or moving camera during recording and decides which saliency
detection method will be used, as well as a 2D video production component that
is responsible to create a conventional 2D video containing the salient events
in the 360-degrees video. Quantitative evaluations using two datasets for
360-degrees video saliency detection (VR-EyeTracking, Sports-360) show the
accuracy and positive impact of the developed decision mechanism, and justify
our choice to use two different methods for detecting the salient events. A
qualitative analysis using content from these datasets, gives further insights
about the functionality of the decision mechanism, shows the pros and cons of
each used saliency detection method and demonstrates the advanced performance
of the trained summarization method against a more conventional approach.
Related papers
- Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets [62.280729345770936]
We introduce the task of Alignable Video Retrieval (AVR)
Given a query video, our approach can identify well-alignable videos from a large collection of clips and temporally synchronize them to the query.
Our experiments on 3 datasets, including large-scale Kinetics700, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-02T20:00:49Z) - A Human-Annotated Video Dataset for Training and Evaluation of 360-Degree Video Summarization Methods [6.076406622352117]
We introduce a new dataset for 360-degree video summarization: the transformation of 360-degree video content to concise 2D-video summaries.
The dataset includes ground-truth human-generated summaries, that can be used for training and objectively evaluating 360-degree video summarization methods.
arXiv Detail & Related papers (2024-06-05T06:43:48Z) - 360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos [16.372814014632944]
We propose a comprehensive dataset and benchmark that incorporates a new component called omnidirectional video object segmentation (360VOS)
360VOS dataset includes 290 sequences accompanied by dense pixel-wise masks and covers a broader range of target categories.
We benchmark state-of-the-art approaches and demonstrate the effectiveness of our proposed 360 tracking framework and training dataset.
arXiv Detail & Related papers (2024-04-22T07:54:53Z) - Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - Evaluating Point Cloud from Moving Camera Videos: A No-Reference Metric [58.309735075960745]
This paper explores the way of dealing with point cloud quality assessment (PCQA) tasks via video quality assessment (VQA) methods.
We generate the captured videos by rotating the camera around the point clouds through several circular pathways.
We extract both spatial and temporal quality-aware features from the selected key frames and the video clips through using trainable 2D-CNN and pre-trained 3D-CNN models.
arXiv Detail & Related papers (2022-08-30T08:59:41Z) - Condensing a Sequence to One Informative Frame for Video Recognition [113.3056598548736]
This paper studies a two-step alternative that first condenses the video sequence to an informative "frame"
A valid question is how to define "useful information" and then distill from a sequence down to one synthetic frame.
IFS consistently demonstrates evident improvements on image-based 2D networks and clip-based 3D networks.
arXiv Detail & Related papers (2022-01-11T16:13:43Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Video Summarization through Reinforcement Learning with a 3D
Spatio-Temporal U-Net [15.032516344808526]
We introduce 3DST-UNet-RL framework for video summarization.
We show experimental evidence for the effectiveness of 3DST-UNet-RL on two commonly used general video summarization benchmarks.
The proposed video summarization has the potential to save storage costs of ultrasound screening videos as well as to increase efficiency when browsing patient video data during retrospective analysis.
arXiv Detail & Related papers (2021-06-19T16:27:19Z) - ATSal: An Attention Based Architecture for Saliency Prediction in 360
Videos [5.831115928056554]
This paper proposes ATSal, a novel attention based (head-eye) saliency model for 360degree videos.
We compare the proposed approach to other state-of-the-art saliency models on two datasets: Salient360! and VR-EyeTracking.
Experimental results on over 80 ODV videos (75K+ frames) show that the proposed method outperforms the existing state-of-the-art.
arXiv Detail & Related papers (2020-11-20T19:19:48Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z) - Weakly-Supervised Multi-Person Action Recognition in 360$^{\circ}$
Videos [24.4517195084202]
We address the problem of action recognition in top-view 360$circ$ videos.
The proposed framework first transforms omnidirectional videos into panoramic videos, then it extracts spatial-temporal features using region-based 3D CNNs for action recognition.
We propose a weakly-supervised method based on multi-instance multi-label learning, which trains the model to recognize and localize multiple actions in a video using only video-level action labels as supervision.
arXiv Detail & Related papers (2020-02-09T02:17:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.