Related papers: ATSal: An Attention Based Architecture for Saliency Prediction in 360 Videos

ATSal: An Attention Based Architecture for Saliency Prediction in 360 Videos

URL: http://arxiv.org/abs/2011.10600v1
Date: Fri, 20 Nov 2020 19:19:48 GMT
Title: ATSal: An Attention Based Architecture for Saliency Prediction in 360 Videos
Authors: Yasser Dahou, Marouane Tliba, Kevin McGuinness, Noel O'Connor
Abstract summary: This paper proposes ATSal, a novel attention based (head-eye) saliency model for 360degree videos. We compare the proposed approach to other state-of-the-art saliency models on two datasets: Salient360! and VR-EyeTracking. Experimental results on over 80 ODV videos (75K+ frames) show that the proposed method outperforms the existing state-of-the-art.
Score: 5.831115928056554
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The spherical domain representation of 360 video/image presents many challenges related to the storage, processing, transmission and rendering of omnidirectional videos (ODV). Models of human visual attention can be used so that only a single viewport is rendered at a time, which is important when developing systems that allow users to explore ODV with head mounted displays (HMD). Accordingly, researchers have proposed various saliency models for 360 video/images. This paper proposes ATSal, a novel attention based (head-eye) saliency model for 360\degree videos. The attention mechanism explicitly encodes global static visual attention allowing expert models to focus on learning the saliency on local patches throughout consecutive frames. We compare the proposed approach to other state-of-the-art saliency models on two datasets: Salient360! and VR-EyeTracking. Experimental results on over 80 ODV videos (75K+ frames) show that the proposed method outperforms the existing state-of-the-art.

Related papers

ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models [52.87334248847314]
We propose a novel framework utilizing pretrained perspective video models for generating panoramic videos.<n>Specifically, we design a novel panorama representation named ViewPoint map, which possesses global spatial continuity and fine-grained visual details simultaneously.<n>Our method can synthesize highly dynamic and spatially consistent panoramic videos, achieving state-of-the-art performance and surpassing previous methods.
arXiv Detail & Related papers (2025-06-30T04:33:34Z)
Omnidirectional Video Super-Resolution using Deep Learning [3.281128493853064]
The limited spatial resolution in 360deg videos does not allow for each degree of view to be represented with adequate pixels.<n>This paper proposes a novel deep learning model for 360deg Video Super-Resolution (360deg VSR) called Spherical Signal Super-resolution with a Proportioned optimisation (S3PO)<n>S3PO adopts recurrent modelling with an attention mechanism, unbound from conventional VSR techniques like alignment.
arXiv Detail & Related papers (2025-06-03T05:59:21Z)
Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos [64.10180665546237]
360deg videos offer a more complete perspective of our surroundings. Existing video models excel at producing standard videos, but their ability to generate full panoramic videos remains elusive. We develop a high-quality data filtering pipeline to curate pairwise training data and improve the quality of 360deg video generation. Experimental results demonstrate that our model can generate realistic and coherent 360deg videos from in-the-wild perspective video.
arXiv Detail & Related papers (2025-04-10T17:51:38Z)
From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos [71.22810401256234]
Three-dimensional (3D) understanding of objects and scenes play a key role in humans' ability to interact with the world. Large scale synthetic and object-centric 3D datasets have shown to be effective in training models that have 3D understanding of objects. We introduce 360-1M, a 360 video dataset, and a process for efficiently finding corresponding frames from diverse viewpoints at scale.
arXiv Detail & Related papers (2024-12-10T18:59:44Z)
MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views [90.26609689682876]
We introduce MVSplat360, a feed-forward approach for 360deg novel view synthesis (NVS) of diverse real-world scenes, using only sparse observations. This setting is inherently ill-posed due to minimal overlap among input views and insufficient visual information provided. Our model is end-to-end trainable and supports rendering arbitrary views with as few as 5 sparse input views.
arXiv Detail & Related papers (2024-11-07T17:59:31Z)
Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors [51.36238367193988]
We tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM) We present SparseSplat360, a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views. Our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail.
arXiv Detail & Related papers (2024-05-26T11:01:39Z)
See360: Novel Panoramic View Interpolation [24.965259708297932]
See360 is a versatile and efficient framework for 360 panoramic view using latent space viewpoint estimation. We show that the proposed method is generic enough to achieve real-time rendering of arbitrary views for four datasets.
arXiv Detail & Related papers (2024-01-07T09:17:32Z)
An Integrated System for Spatio-Temporal Summarization of 360-degrees Videos [6.8292720972215974]
We present an integrated system for summarization of 360-degrees videos. The video production mainly involves the detection of events and their synopsis into a concise summary. The analysis relies on state-of-the-art methods for saliency detection in 360-degrees video.
arXiv Detail & Related papers (2023-12-05T08:48:31Z)
Spherical Vision Transformer for 360-degree Video Saliency Prediction [17.948179628551376]
We propose a vision-transformer-based model for omnidirectional videos named SalViT360. We introduce a spherical geometry-aware self-attention mechanism that is capable of effective omnidirectional video understanding. Our approach is the first to employ tangent images for omnidirectional saliency prediction prediction, and our experimental results on three ODV saliency datasets demonstrate its effectiveness compared to the state-of-the-art.
arXiv Detail & Related papers (2023-08-24T18:07:37Z)
NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes [59.15910989235392]
We introduce NeO 360, Neural fields for sparse view synthesis of outdoor scenes. NeO 360 is a generalizable method that reconstructs 360deg scenes from a single or a few posed RGB images. Our representation combines the best of both voxel-based and bird's-eye-view (BEV) representations.
arXiv Detail & Related papers (2023-08-24T17:59:50Z)
Panoramic Vision Transformer for Saliency Detection in 360{\deg} Videos [48.54829780502176]
We present a new framework named Panoramic Vision Transformer (PAVER) We design the encoder using Vision Transformer with deformable convolution, which enables us to plug pretrained models from normal videos into our architecture without additional modules or finetuning. We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision.
arXiv Detail & Related papers (2022-09-19T12:23:34Z)
Perceptual Quality Assessment of Omnidirectional Images as Moving Camera Videos [49.217528156417906]
Two types of VR viewing conditions are crucial in determining the viewing behaviors of users and the perceived quality of the panorama. We first transform an omnidirectional image to several video representations using different user viewing behaviors under different viewing conditions. We then leverage advanced 2D full-reference video quality models to compute the perceived quality.
arXiv Detail & Related papers (2020-05-21T10:03:40Z)
Visual Question Answering on 360{\deg} Images [96.00046925811515]
VQA 360 is a novel task of visual question answering on 360 images. We collect the first VQA 360 dataset, containing around 17,000 real-world image-question-answer triplets for a variety of question types.
arXiv Detail & Related papers (2020-01-10T08:18:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.