Multi-Projection Fusion and Refinement Network for Salient Object
Detection in 360{\deg} Omnidirectional Image
- URL: http://arxiv.org/abs/2212.12378v1
- Date: Fri, 23 Dec 2022 14:50:40 GMT
- Title: Multi-Projection Fusion and Refinement Network for Salient Object
Detection in 360{\deg} Omnidirectional Image
- Authors: Runmin Cong, Ke Huang, Jianjun Lei, Yao Zhao, Qingming Huang, and Sam
Kwong
- Abstract summary: We propose a Multi-Projection Fusion and Refinement Network (MPFR-Net) to detect the salient objects in 360deg omnidirectional image.
MPFR-Net uses the equirectangular projection image and four corresponding cube-unfolding images as inputs.
Experimental results on two omnidirectional datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both qualitatively and quantitatively.
- Score: 141.10227079090419
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Salient object detection (SOD) aims to determine the most visually attractive
objects in an image. With the development of virtual reality technology,
360{\deg} omnidirectional image has been widely used, but the SOD task in
360{\deg} omnidirectional image is seldom studied due to its severe distortions
and complex scenes. In this paper, we propose a Multi-Projection Fusion and
Refinement Network (MPFR-Net) to detect the salient objects in 360{\deg}
omnidirectional image. Different from the existing methods, the equirectangular
projection image and four corresponding cube-unfolding images are embedded into
the network simultaneously as inputs, where the cube-unfolding images not only
provide supplementary information for equirectangular projection image, but
also ensure the object integrity of the cube-map projection. In order to make
full use of these two projection modes, a Dynamic Weighting Fusion (DWF) module
is designed to adaptively integrate the features of different projections in a
complementary and dynamic manner from the perspective of inter and intra
features. Furthermore, in order to fully explore the way of interaction between
encoder and decoder features, a Filtration and Refinement (FR) module is
designed to suppress the redundant information between the feature itself and
the feature. Experimental results on two omnidirectional datasets demonstrate
that the proposed approach outperforms the state-of-the-art methods both
qualitatively and quantitatively.
Related papers
- Context and Geometry Aware Voxel Transformer for Semantic Scene Completion [7.147020285382786]
Vision-based Semantic Scene Completion (SSC) has gained much attention due to its widespread applications in various 3D perception tasks.
Existing sparse-to-dense approaches typically employ shared context-independent queries across various input images.
We introduce a neural network named CGFormer to achieve semantic scene completion.
arXiv Detail & Related papers (2024-05-22T14:16:30Z) - FusionViT: Hierarchical 3D Object Detection via LiDAR-Camera Vision
Transformer Fusion [8.168523242105763]
We will introduce a novel vision transformer-based 3D object detection model, namely FusionViT.
Our FusionViT model can achieve state-of-the-art performance and outperforms existing baseline methods.
arXiv Detail & Related papers (2023-11-07T00:12:01Z) - SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and
Multi-View for 3D Object Retrieval [8.74845857766369]
Multi-modality 3D object retrieval is rarely developed and analyzed on large-scale datasets.
We propose self-and-cross attention based aggregation of point cloud and multi-view images (SCA-PVNet) for 3D object retrieval.
arXiv Detail & Related papers (2023-07-20T05:46:32Z) - Adaptive Rotated Convolution for Rotated Object Detection [96.94590550217718]
We present Adaptive Rotated Convolution (ARC) module to handle rotated object detection problem.
In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images.
The proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP.
arXiv Detail & Related papers (2023-03-14T11:53:12Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object
Detection [16.198358858773258]
Multi-modal 3D object detection has been an active research topic in autonomous driving.
It is non-trivial to explore the cross-modal feature fusion between sparse 3D points and dense 2D pixels.
Recent approaches either fuse the image features with the point cloud features that are projected onto the 2D image plane or combine the sparse point cloud with dense image pixels.
arXiv Detail & Related papers (2022-10-18T06:15:56Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo [103.08512487830669]
We present a modern solution to the multi-view photometric stereo problem (MVPS)
We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry.
Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network.
arXiv Detail & Related papers (2021-10-11T20:20:03Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.