SODAWideNet -- Salient Object Detection with an Attention augmented Wide
Encoder Decoder network without ImageNet pre-training
- URL: http://arxiv.org/abs/2311.04828v2
- Date: Thu, 9 Nov 2023 01:49:28 GMT
- Title: SODAWideNet -- Salient Object Detection with an Attention augmented Wide
Encoder Decoder network without ImageNet pre-training
- Authors: Rohit Venkata Sai Dulam and Chandra Kambhamettu
- Abstract summary: We explore developing a neural network from scratch directly trained on Salient Object Detection without ImageNet pre-training.
We propose SODAWideNet, an encoder-decoder-style network for Salient Object Detection.
Two variants, SODAWideNet-S (3.03M) and SODAWideNet (9.03M), achieve competitive performance against state-of-the-art models on five datasets.
- Score: 3.66237529322911
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developing a new Salient Object Detection (SOD) model involves selecting an
ImageNet pre-trained backbone and creating novel feature refinement modules to
use backbone features. However, adding new components to a pre-trained backbone
needs retraining the whole network on the ImageNet dataset, which requires
significant time. Hence, we explore developing a neural network from scratch
directly trained on SOD without ImageNet pre-training. Such a formulation
offers full autonomy to design task-specific components. To that end, we
propose SODAWideNet, an encoder-decoder-style network for Salient Object
Detection. We deviate from the commonly practiced paradigm of narrow and deep
convolutional models to a wide and shallow architecture, resulting in a
parameter-efficient deep neural network. To achieve a shallower network, we
increase the receptive field from the beginning of the network using a
combination of dilated convolutions and self-attention. Therefore, we propose
Multi Receptive Field Feature Aggregation Module (MRFFAM) that efficiently
obtains discriminative features from farther regions at higher resolutions
using dilated convolutions. Next, we propose Multi-Scale Attention (MSA), which
creates a feature pyramid and efficiently computes attention across multiple
resolutions to extract global features from larger feature maps. Finally, we
propose two variants, SODAWideNet-S (3.03M) and SODAWideNet (9.03M), that
achieve competitive performance against state-of-the-art models on five
datasets.
Related papers
- SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection [3.2586315449885106]
We propose a novel encoder-decoder-style neural network called SODAWideNet++ designed explicitly for Salient Object Detection.
Inspired by the vision transformers ability to attain a global receptive field from the initial stages, we introduce the Attention Guided Long Range Feature Extraction (AGLRFE) module.
In contrast to the current paradigm of ImageNet pre-training, we modify 118K annotated images from the COCO semantic segmentation dataset by binarizing the annotations to pre-train the proposed model end-to-end.
arXiv Detail & Related papers (2024-08-29T15:51:06Z) - Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.
We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks.
Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z) - PointeNet: A Lightweight Framework for Effective and Efficient Point
Cloud Analysis [28.54939134635978]
PointeNet is a network designed specifically for point cloud analysis.
Our method demonstrates flexibility by seamlessly integrating with a classification/segmentation head or embedding into off-the-shelf 3D object detection networks.
Experiments on object-level datasets, including ModelNet40, ScanObjectNN, ShapeNet KITTI, and the scene-level dataset KITTI, demonstrate the superior performance of PointeNet over state-of-the-art methods in point cloud analysis.
arXiv Detail & Related papers (2023-12-20T03:34:48Z) - SVNet: Where SO(3) Equivariance Meets Binarization on Point Cloud
Representation [65.4396959244269]
The paper tackles the challenge by designing a general framework to construct 3D learning architectures.
The proposed approach can be applied to general backbones like PointNet and DGCNN.
Experiments on ModelNet40, ShapeNet, and the real-world dataset ScanObjectNN, demonstrated that the method achieves a great trade-off between efficiency, rotation, and accuracy.
arXiv Detail & Related papers (2022-09-13T12:12:19Z) - An Efficient End-to-End 3D Model Reconstruction based on Neural
Architecture Search [5.913946292597174]
We propose an efficient model reconstruction method utilizing neural architecture search (NAS) and binary classification.
Our method achieves significantly higher reconstruction accuracy using fewer network parameters.
arXiv Detail & Related papers (2022-02-27T08:53:43Z) - Recalibration of Neural Networks for Point Cloud Analysis [3.7814216736076434]
We introduce re-calibration modules on deep neural networks for 3D point clouds.
We demonstrate the benefit and versatility of our proposed modules by incorporating them into three state-of-the-art networks for 3D point cloud analysis.
In the second set of experiments, we investigate the benefits of re-calibration blocks on Alzheimer's Disease diagnosis.
arXiv Detail & Related papers (2020-11-25T17:14:34Z) - Learning Deep Interleaved Networks with Asymmetric Co-Attention for
Image Restoration [65.11022516031463]
We present a deep interleaved network (DIN) that learns how information at different states should be combined for high-quality (HQ) images reconstruction.
In this paper, we propose asymmetric co-attention (AsyCA) which is attached at each interleaved node to model the feature dependencies.
Our presented DIN can be trained end-to-end and applied to various image restoration tasks.
arXiv Detail & Related papers (2020-10-29T15:32:00Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - BiO-Net: Learning Recurrent Bi-directional Connections for
Encoder-Decoder Architecture [82.64881585566825]
We present a novel Bi-directional O-shape network (BiO-Net) that reuses the building blocks in a recurrent manner without introducing any extra parameters.
Our method significantly outperforms the vanilla U-Net as well as other state-of-the-art methods.
arXiv Detail & Related papers (2020-07-01T05:07:49Z) - DRU-net: An Efficient Deep Convolutional Neural Network for Medical
Image Segmentation [2.3574651879602215]
Residual network (ResNet) and densely connected network (DenseNet) have significantly improved the training efficiency and performance of deep convolutional neural networks (DCNNs)
We propose an efficient network architecture by considering advantages of both networks.
arXiv Detail & Related papers (2020-04-28T12:16:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.