Sphere-GAN: a GAN-based Approach for Saliency Estimation in 360° Videos
- URL: http://arxiv.org/abs/2509.11948v1
- Date: Mon, 15 Sep 2025 14:07:33 GMT
- Title: Sphere-GAN: a GAN-based Approach for Saliency Estimation in 360° Videos
- Authors: Mahmoud Z. A. Wahba, Sara Baldoni, Federica Battisti,
- Abstract summary: Saliency estimation provides a powerful tool to identify visually relevant areas.<n>We introduce Sphere-GAN, a saliency detection model for 360deg videos that leverages a Generative Adversarial Network with spherical convolutions.
- Score: 5.66239168125163
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent success of immersive applications is pushing the research community to define new approaches to process 360{\deg} images and videos and optimize their transmission. Among these, saliency estimation provides a powerful tool that can be used to identify visually relevant areas and, consequently, adapt processing algorithms. Although saliency estimation has been widely investigated for 2D content, very few algorithms have been proposed for 360{\deg} saliency estimation. Towards this goal, we introduce Sphere-GAN, a saliency detection model for 360{\deg} videos that leverages a Generative Adversarial Network with spherical convolutions. Extensive experiments were conducted using a public 360{\deg} video saliency dataset, and the results demonstrate that Sphere-GAN outperforms state-of-the-art models in accurately predicting saliency maps.
Related papers
- SalFormer360: a transformer-based saliency estimation model for 360-degree videos [6.699918556514895]
We propose SalFormer360, a novel saliency estimation model for 360-degree videos built on a transformer-based architecture.<n>Our approach is based on the combination of an existing encoder architecture, SegFormer, and a custom decoder.<n>Experiments on the three largest benchmark datasets for saliency estimation demonstrate that SalFormer360 outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2026-02-04T14:11:00Z) - RPG360: Robust 360 Depth Estimation with Perspective Foundation Models and Graph Optimization [48.99932182976206]
RPG360 is a training-free robust 360 monocular depth estimation method.<n>We introduce a novel depth scale alignment technique using graph-based optimization.<n>Our method achieves superior performance across diverse datasets, including Matterport3D, Stanford2D3D, and 360Loc.
arXiv Detail & Related papers (2025-09-28T17:33:12Z) - NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model [57.92709692193132]
NovelGS is a diffusion model for Gaussian Splatting given sparse-view images.
We leverage the novel view denoising through a transformer-based network to generate 3D Gaussians.
arXiv Detail & Related papers (2024-11-25T07:57:17Z) - Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation [6.832852988957967]
We propose a new depth estimation framework that utilizes unlabeled 360-degree data effectively.
Our approach uses state-of-the-art perspective depth estimation models as teacher models to generate pseudo labels.
We tested our approach on benchmark datasets such as Matterport3D and Stanford2D3D, showing significant improvements in depth estimation accuracy.
arXiv Detail & Related papers (2024-06-18T17:59:31Z) - R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction [53.19869886963333]
3D Gaussian splatting (3DGS) has shown promising results in rendering image and surface reconstruction.
This paper introduces R2$-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction.
arXiv Detail & Related papers (2024-05-31T08:39:02Z) - Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors [51.36238367193988]
We tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM)
We present SparseSplat360, a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views.
Our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail.
arXiv Detail & Related papers (2024-05-26T11:01:39Z) - SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition [66.56357905500512]
3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis.<n>We propose SAGD, a conceptually simple yet effective boundary-enhanced segmentation pipeline for 3D-GS.<n>Our approach achieves high-quality 3D segmentation without rough boundary issues, which can be easily applied to other scene editing tasks.
arXiv Detail & Related papers (2024-01-31T14:19:03Z) - GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis [70.24111297192057]
We present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner.
The proposed method enables 2K-resolution rendering under a sparse-view camera setting.
arXiv Detail & Related papers (2023-12-04T18:59:55Z) - 360 Depth Estimation in the Wild -- The Depth360 Dataset and the SegFuse
Network [35.03201732370496]
Single-view depth estimation from omnidirectional images has gained popularity with its wide range of applications such as autonomous driving and scene reconstruction.
In this work, we first establish a large-scale dataset with varied settings called Depth360 to tackle the training data problem.
We then propose an end-to-end two-branch multi-task learning network, SegFuse, that mimics the human eye to effectively learn from the dataset.
arXiv Detail & Related papers (2022-02-16T11:56:31Z) - Field-of-View IoU for Object Detection in 360{\deg} Images [36.72543749626039]
We propose two fundamental techniques -- Field-of-View IoU (FoV-IoU) and 360Augmentation for object detection in 360deg images.
FoV-IoU computes the intersection-over-union of two Field-of-View bounding boxes in a spherical image which could be used for training, inference, and evaluation.
360Augmentation is a data augmentation technique specific to 360deg object detection task which randomly rotates a spherical image and solves the bias due to the sphere-to-plane projection.
arXiv Detail & Related papers (2022-02-07T14:01:59Z) - Applying VertexShuffle Toward 360-Degree Video Super-Resolution on
Focused-Icosahedral-Mesh [10.29596292902288]
We exploit Focused Icosahedral Mesh to represent a small area and construct matrices to rotate spherical content to the focused mesh area.
We also proposed a novel VertexShuffle operation that can significantly improve both the performance and the efficiency.
Our proposed spherical super-resolution model achieves significant benefits in terms of both performance and inference time.
arXiv Detail & Related papers (2021-06-21T16:53:57Z) - InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic
Information Modeling [65.47126868838836]
We propose a novel 3D object detection framework with dynamic information modeling.
Coarse predictions are generated in the first stage via a voxel-based region proposal network.
Experiments are conducted on the large-scale nuScenes 3D detection benchmark.
arXiv Detail & Related papers (2020-07-16T18:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.