LF Tracy: A Unified Single-Pipeline Approach for Salient Object
Detection in Light Field Cameras
- URL: http://arxiv.org/abs/2401.16712v1
- Date: Tue, 30 Jan 2024 03:17:02 GMT
- Title: LF Tracy: A Unified Single-Pipeline Approach for Salient Object
Detection in Light Field Cameras
- Authors: Fei Teng, Jiaming Zhang, Jiawei Liu, Kunyu Peng, Xina Cheng, Zhiyong
Li, Kailun Yang
- Abstract summary: We propose an efficient paradigm to adapt light field data to enhance Salient Object Detection (SOD)
By utilizing only 28.9M parameters, the model achieves a 10% increase in accuracy with 3M additional parameters compared to its backbone using RGB images and an 86% rise to its backbone using LF images.
- Score: 22.288764512594433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Leveraging the rich information extracted from light field (LF) cameras is
instrumental for dense prediction tasks. However, adapting light field data to
enhance Salient Object Detection (SOD) still follows the traditional RGB
methods and remains under-explored in the community. Previous approaches
predominantly employ a custom two-stream design to discover the implicit
angular feature within light field cameras, leading to significant information
isolation between different LF representations. In this study, we propose an
efficient paradigm (LF Tracy) to address this limitation. We eschew the
conventional specialized fusion and decoder architecture for a dual-stream
backbone in favor of a unified, single-pipeline approach. This comprises
firstly a simple yet effective data augmentation strategy called MixLD to
bridge the connection of spatial, depth, and implicit angular information under
different LF representations. A highly efficient information aggregation (IA)
module is then introduced to boost asymmetric feature-wise information fusion.
Owing to this innovative approach, our model surpasses the existing
state-of-the-art methods, particularly demonstrating a 23% improvement over
previous results on the latest large-scale PKU dataset. By utilizing only 28.9M
parameters, the model achieves a 10% increase in accuracy with 3M additional
parameters compared to its backbone using RGB images and an 86% rise to its
backbone using LF images. The source code will be made publicly available at
https://github.com/FeiBryantkit/LF-Tracy.
Related papers
- OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic
Segmentation [51.739401680890325]
We propose a new paradigm, Omni-Aperture Fusion model (OAFuser) for light field cameras.
OAFuser discovers the angular information from sub-aperture images to generate a semantically consistent result.
Our proposed OAFuser achieves state-of-the-art performance on the UrbanLF-Real and -Syn datasets.
arXiv Detail & Related papers (2023-07-28T14:43:27Z) - FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection [11.962073589763676]
Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm.
The sparsity of point clouds, especially for the points far away, makes it difficult for the LiDAR-only refinement module to accurately recognize and locate objects.
We propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI)
FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on baseline, and outperforms competing two-stage approaches.
arXiv Detail & Related papers (2022-09-22T02:07:25Z) - Perception-aware Multi-sensor Fusion for 3D LiDAR Semantic Segmentation [59.42262859654698]
3D semantic segmentation is important in scene understanding for many applications, such as auto-driving and robotics.
Existing fusion-based methods may not achieve promising performance due to vast difference between two modalities.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) to exploit perceptual information from two modalities.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Middle-level Fusion for Lightweight RGB-D Salient Object Detection [81.43951906434175]
A novel lightweight RGB-D SOD model is presented in this paper.
With IMFF and L modules incorporated in the middle-level fusion structure, our proposed model has only 3.9M parameters and runs at 33 FPS.
The experimental results on several benchmark datasets verify the effectiveness and superiority of the proposed method over some state-of-the-art methods.
arXiv Detail & Related papers (2021-04-23T11:37:15Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - A Single Stream Network for Robust and Real-time RGB-D Salient Object
Detection [89.88222217065858]
We design a single stream network to use the depth map to guide early fusion and middle fusion between RGB and depth.
This model is 55.5% lighter than the current lightest model and runs at a real-time speed of 32 FPS when processing a $384 times 384$ image.
arXiv Detail & Related papers (2020-07-14T04:40:14Z) - VMLoc: Variational Fusion For Learning-Based Multimodal Camera
Localization [46.607930208613574]
We propose an end-to-end framework, termed VMLoc, to fuse different sensor inputs into a common latent space.
Unlike previous multimodal variational works directly adapting the objective function of vanilla variational auto-encoder, we show how camera localization can be accurately estimated.
arXiv Detail & Related papers (2020-03-12T14:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.