MRGAN360: Multi-stage Recurrent Generative Adversarial Network for 360
Degree Image Saliency Prediction
- URL: http://arxiv.org/abs/2303.08525v1
- Date: Wed, 15 Mar 2023 11:15:03 GMT
- Title: MRGAN360: Multi-stage Recurrent Generative Adversarial Network for 360
Degree Image Saliency Prediction
- Authors: Pan Gao, Xinlang Chen, Rong Quan, Wei Xiang
- Abstract summary: We propose a novel multi-stage recurrent generative adversarial networks for ODIs dubbed MRGAN360.
At each stage, the prediction model takes as input the original image and the output of the previous stage and outputs a more accurate saliency map.
We employ a recurrent neural network among adjacent prediction stages to model their correlations, and exploit a discriminator at the end of each stage to supervise the output saliency map.
- Score: 10.541086214760497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thanks to the ability of providing an immersive and interactive experience,
the uptake of 360 degree image content has been rapidly growing in consumer and
industrial applications. Compared to planar 2D images, saliency prediction for
360 degree images is more challenging due to their high resolutions and
spherical viewing ranges. Currently, most high-performance saliency prediction
models for omnidirectional images (ODIs) rely on deeper or broader
convolutional neural networks (CNNs), which benefit from CNNs' superior feature
representation capabilities while suffering from their high computational
costs. In this paper, inspired by the human visual cognitive process, i.e.,
human being's perception of a visual scene is always accomplished by multiple
stages of analysis, we propose a novel multi-stage recurrent generative
adversarial networks for ODIs dubbed MRGAN360, to predict the saliency maps
stage by stage. At each stage, the prediction model takes as input the original
image and the output of the previous stage and outputs a more accurate saliency
map. We employ a recurrent neural network among adjacent prediction stages to
model their correlations, and exploit a discriminator at the end of each stage
to supervise the output saliency map. In addition, we share the weights among
all the stages to obtain a lightweight architecture that is computationally
cheap. Extensive experiments are conducted to demonstrate that our proposed
model outperforms the state-of-the-art model in terms of both prediction
accuracy and model size.
Related papers
- pAE: An Efficient Autoencoder Architecture for Modeling the Lateral Geniculate Nucleus by Integrating Feedforward and Feedback Streams in Human Visual System [0.716879432974126]
We introduce a deep convolutional model that closely approximates human visual information processing.
We aim to approximate the function for the lateral geniculate nucleus (LGN) area using a trained shallow convolutional model.
The pAE model achieves the final 99.26% prediction performance and demonstrates a notable improvement of around 28% over human results in the temporal mode.
arXiv Detail & Related papers (2024-09-20T16:33:01Z) - Uncertainty in AI: Evaluating Deep Neural Networks on
Out-of-Distribution Images [0.0]
This paper investigates the uncertainty of various deep neural networks, including ResNet-50, VGG16, DenseNet121, AlexNet, and GoogleNet, when dealing with perturbed data.
While ResNet-50 was the most accurate single model for OOD images, the ensemble performed even better, correctly classifying all images.
arXiv Detail & Related papers (2023-09-04T22:46:59Z) - Spherical Vision Transformer for 360-degree Video Saliency Prediction [17.948179628551376]
We propose a vision-transformer-based model for omnidirectional videos named SalViT360.
We introduce a spherical geometry-aware self-attention mechanism that is capable of effective omnidirectional video understanding.
Our approach is the first to employ tangent images for omnidirectional saliency prediction prediction, and our experimental results on three ODV saliency datasets demonstrate its effectiveness compared to the state-of-the-art.
arXiv Detail & Related papers (2023-08-24T18:07:37Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - Simple and Effective Synthesis of Indoor 3D Scenes [78.95697556834536]
We study the problem of immersive 3D indoor scenes from one or more images.
Our aim is to generate high-resolution images and videos from novel viewpoints.
We propose an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images.
arXiv Detail & Related papers (2022-04-06T17:54:46Z) - LiP-Flow: Learning Inference-time Priors for Codec Avatars via
Normalizing Flows in Latent Space [90.74976459491303]
We introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space.
A normalizing flow bridges the two representation spaces and transforms latent samples from one domain to another, allowing us to define a latent likelihood objective.
We show that our approach leads to an expressive and effective prior, capturing facial dynamics and subtle expressions better.
arXiv Detail & Related papers (2022-03-15T13:22:57Z) - Capturing Omni-Range Context for Omnidirectional Segmentation [29.738065412097598]
We introduce Concurrent Attention Networks (ECANets) to bridge the gap in terms of FoV and structural distribution between the imaging domains.
We upgrade model training by leveraging multi-source and omni-supervised learning, taking advantage of both: Densely labeled and unlabeled data.
Our novel model, training regimen and multisource prediction fusion elevate the performance (mIoU) to new state-of-the-art results.
arXiv Detail & Related papers (2021-03-09T19:46:09Z) - Perceiver: General Perception with Iterative Attention [85.65927856589613]
We introduce the Perceiver - a model that builds upon Transformers.
We show that this architecture performs competitively or beyond strong, specialized models on classification tasks.
It also surpasses state-of-the-art results for all modalities in AudioSet.
arXiv Detail & Related papers (2021-03-04T18:20:50Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.