Multi-Scale Estimation for Omni-Directional Saliency Maps Using
Learnable Equator Bias
- URL: http://arxiv.org/abs/2309.08139v1
- Date: Fri, 15 Sep 2023 04:08:20 GMT
- Title: Multi-Scale Estimation for Omni-Directional Saliency Maps Using
Learnable Equator Bias
- Authors: Takao Yamanaka, Tatsuya Suzuki, Taiki Nobutsune, Chenjunlin Wu
- Abstract summary: Saliency maps represent probability distributions of gazing points with a head-mounted display.
This paper proposes a novel saliency-map estimation model for the omni-directional images.
The accuracy of the saliency maps was improved by the proposed method.
- Score: 1.413861804135093
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Omni-directional images have been used in wide range of applications. For the
applications, it would be useful to estimate saliency maps representing
probability distributions of gazing points with a head-mounted display, to
detect important regions in the omni-directional images. This paper proposes a
novel saliency-map estimation model for the omni-directional images by
extracting overlapping 2-dimensional (2D) plane images from omni-directional
images at various directions and angles of view. While 2D saliency maps tend to
have high probability at the center of images (center bias), the
high-probability region appears at horizontal directions in omni-directional
saliency maps when a head-mounted display is used (equator bias). Therefore,
the 2D saliency model with a center-bias layer was fine-tuned with an
omni-directional dataset by replacing the center-bias layer to an equator-bias
layer conditioned on the elevation angle for the extraction of the 2D plane
image. The limited availability of omni-directional images in saliency datasets
can be compensated by using the well-established 2D saliency model pretrained
by a large number of training images with the ground truth of 2D saliency maps.
In addition, this paper proposes a multi-scale estimation method by extracting
2D images in multiple angles of view to detect objects of various sizes with
variable receptive fields. The saliency maps estimated from the multiple angles
of view were integrated by using pixel-wise attention weights calculated in an
integration layer for weighting the optimal scale to each object. The proposed
method was evaluated using a publicly available dataset with evaluation metrics
for omni-directional saliency maps. It was confirmed that the accuracy of the
saliency maps was improved by the proposed method.
Related papers
- View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics.
The proposed method addresses limitations in existing cross-view localization methods.
It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z) - Image-based Geolocalization by Ground-to-2.5D Map Matching [21.21416396311102]
Methods often utilize cross-view localization techniques to match ground-view query images with 2D maps.
We propose a new approach to learning representative embeddings from multi-modal data.
By encoding crucial geometric cues, our method learns discriminative location embeddings for matching panoramic images and maps.
arXiv Detail & Related papers (2023-08-11T08:00:30Z) - ${S}^{2}$Net: Accurate Panorama Depth Estimation on Spherical Surface [4.649656275858966]
We propose an end-to-end deep network for monocular panorama depth estimation on a unit spherical surface.
Specifically, we project the feature maps extracted from equirectangular images onto unit spherical surface sampled by uniformly distributed grids.
We propose a global cross-attention-based fusion module to fuse the feature maps from skip connection and enhance the ability to obtain global context.
arXiv Detail & Related papers (2023-01-14T07:39:15Z) - Multi-Projection Fusion and Refinement Network for Salient Object
Detection in 360{\deg} Omnidirectional Image [141.10227079090419]
We propose a Multi-Projection Fusion and Refinement Network (MPFR-Net) to detect the salient objects in 360deg omnidirectional image.
MPFR-Net uses the equirectangular projection image and four corresponding cube-unfolding images as inputs.
Experimental results on two omnidirectional datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-12-23T14:50:40Z) - SketchSampler: Sketch-based 3D Reconstruction via View-dependent Depth
Sampling [75.957103837167]
Reconstructing a 3D shape based on a single sketch image is challenging due to the large domain gap between a sparse, irregular sketch and a regular, dense 3D shape.
Existing works try to employ the global feature extracted from sketch to directly predict the 3D coordinates, but they usually suffer from losing fine details that are not faithful to the input sketch.
arXiv Detail & Related papers (2022-08-14T16:37:51Z) - 2D LiDAR and Camera Fusion Using Motion Cues for Indoor Layout
Estimation [2.6905021039717987]
A ground robot explores an indoor space with a single floor and vertical walls, and collects a sequence of intensity images and 2D LiDAR datasets.
The alignment of sensor outputs and image segmentation are computed jointly by aligning LiDAR points.
The ambiguity in images for ground-wall boundary extraction is removed with the assistance of LiDAR observations.
arXiv Detail & Related papers (2022-04-24T06:26:02Z) - Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo [103.08512487830669]
We present a modern solution to the multi-view photometric stereo problem (MVPS)
We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry.
Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network.
arXiv Detail & Related papers (2021-10-11T20:20:03Z) - Where am I looking at? Joint Location and Orientation Estimation by
Cross-View Matching [95.64702426906466]
Cross-view geo-localization is a problem given a large-scale database of geo-tagged aerial images.
Knowing orientation between ground and aerial images can significantly reduce matching ambiguity between these two views.
We design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization.
arXiv Detail & Related papers (2020-05-08T05:21:16Z) - OmniSLAM: Omnidirectional Localization and Dense Mapping for
Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras.
For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation.
We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z) - Indoor Layout Estimation by 2D LiDAR and Camera Fusion [3.2387553628943535]
This paper presents an algorithm for indoor layout estimation and reconstruction through the fusion of a sequence of captured images and LiDAR data sets.
In the proposed system, a movable platform collects both intensity images and 2D LiDAR information.
arXiv Detail & Related papers (2020-01-15T16:43:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.