ATSal: An Attention Based Architecture for Saliency Prediction in 360
Videos
- URL: http://arxiv.org/abs/2011.10600v1
- Date: Fri, 20 Nov 2020 19:19:48 GMT
- Title: ATSal: An Attention Based Architecture for Saliency Prediction in 360
Videos
- Authors: Yasser Dahou, Marouane Tliba, Kevin McGuinness, Noel O'Connor
- Abstract summary: This paper proposes ATSal, a novel attention based (head-eye) saliency model for 360degree videos.
We compare the proposed approach to other state-of-the-art saliency models on two datasets: Salient360! and VR-EyeTracking.
Experimental results on over 80 ODV videos (75K+ frames) show that the proposed method outperforms the existing state-of-the-art.
- Score: 5.831115928056554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The spherical domain representation of 360 video/image presents many
challenges related to the storage, processing, transmission and rendering of
omnidirectional videos (ODV). Models of human visual attention can be used so
that only a single viewport is rendered at a time, which is important when
developing systems that allow users to explore ODV with head mounted displays
(HMD). Accordingly, researchers have proposed various saliency models for 360
video/images. This paper proposes ATSal, a novel attention based (head-eye)
saliency model for 360\degree videos. The attention mechanism explicitly
encodes global static visual attention allowing expert models to focus on
learning the saliency on local patches throughout consecutive frames. We
compare the proposed approach to other state-of-the-art saliency models on two
datasets: Salient360! and VR-EyeTracking. Experimental results on over 80 ODV
videos (75K+ frames) show that the proposed method outperforms the existing
state-of-the-art.
Related papers
- MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views [90.26609689682876]
We introduce MVSplat360, a feed-forward approach for 360deg novel view synthesis (NVS) of diverse real-world scenes, using only sparse observations.
This setting is inherently ill-posed due to minimal overlap among input views and insufficient visual information provided.
Our model is end-to-end trainable and supports rendering arbitrary views with as few as 5 sparse input views.
arXiv Detail & Related papers (2024-11-07T17:59:31Z) - Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors [51.36238367193988]
We tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM)
We present SparseSplat360, a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views.
Our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail.
arXiv Detail & Related papers (2024-05-26T11:01:39Z) - See360: Novel Panoramic View Interpolation [24.965259708297932]
See360 is a versatile and efficient framework for 360 panoramic view using latent space viewpoint estimation.
We show that the proposed method is generic enough to achieve real-time rendering of arbitrary views for four datasets.
arXiv Detail & Related papers (2024-01-07T09:17:32Z) - An Integrated System for Spatio-Temporal Summarization of 360-degrees
Videos [6.8292720972215974]
We present an integrated system for summarization of 360-degrees videos.
The video production mainly involves the detection of events and their synopsis into a concise summary.
The analysis relies on state-of-the-art methods for saliency detection in 360-degrees video.
arXiv Detail & Related papers (2023-12-05T08:48:31Z) - Spherical Vision Transformer for 360-degree Video Saliency Prediction [17.948179628551376]
We propose a vision-transformer-based model for omnidirectional videos named SalViT360.
We introduce a spherical geometry-aware self-attention mechanism that is capable of effective omnidirectional video understanding.
Our approach is the first to employ tangent images for omnidirectional saliency prediction prediction, and our experimental results on three ODV saliency datasets demonstrate its effectiveness compared to the state-of-the-art.
arXiv Detail & Related papers (2023-08-24T18:07:37Z) - NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes [59.15910989235392]
We introduce NeO 360, Neural fields for sparse view synthesis of outdoor scenes.
NeO 360 is a generalizable method that reconstructs 360deg scenes from a single or a few posed RGB images.
Our representation combines the best of both voxel-based and bird's-eye-view (BEV) representations.
arXiv Detail & Related papers (2023-08-24T17:59:50Z) - Panoramic Vision Transformer for Saliency Detection in 360{\deg} Videos [48.54829780502176]
We present a new framework named Panoramic Vision Transformer (PAVER)
We design the encoder using Vision Transformer with deformable convolution, which enables us to plug pretrained models from normal videos into our architecture without additional modules or finetuning.
We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision.
arXiv Detail & Related papers (2022-09-19T12:23:34Z) - Perceptual Quality Assessment of Omnidirectional Images as Moving Camera
Videos [49.217528156417906]
Two types of VR viewing conditions are crucial in determining the viewing behaviors of users and the perceived quality of the panorama.
We first transform an omnidirectional image to several video representations using different user viewing behaviors under different viewing conditions.
We then leverage advanced 2D full-reference video quality models to compute the perceived quality.
arXiv Detail & Related papers (2020-05-21T10:03:40Z) - Visual Question Answering on 360{\deg} Images [96.00046925811515]
VQA 360 is a novel task of visual question answering on 360 images.
We collect the first VQA 360 dataset, containing around 17,000 real-world image-question-answer triplets for a variety of question types.
arXiv Detail & Related papers (2020-01-10T08:18:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.