FLIM-based Salient Object Detection Networks with Adaptive Decoders
- URL: http://arxiv.org/abs/2504.20872v1
- Date: Tue, 29 Apr 2025 15:44:02 GMT
- Title: FLIM-based Salient Object Detection Networks with Adaptive Decoders
- Authors: Gilson Junior Soares, Matheus Abrantes Cerqueira, Jancarlo F. Gomes, Laurent Najman, Silvio Jamil F. Guimarães, Alexandre Xavier Falcão,
- Abstract summary: This work proposes flyweight networks, hundreds of times lighter than lightweight models, for Object Detection (SOD)<n>It combines a FLIM encoder with an adaptive decoder, whose weights are estimated for each input image by a given function.<n>We compare FLIM models with adaptive decoders for two challenging SOD tasks with three lightweight networks from the state-of-the-art, two FLIM networks with decoders trained by backpropagation, and one FLIM network whose labeled markers define the decoder's weights.
- Score: 40.26047220842738
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Salient Object Detection (SOD) methods can locate objects that stand out in an image, assign higher values to their pixels in a saliency map, and binarize the map outputting a predicted segmentation mask. A recent tendency is to investigate pre-trained lightweight models rather than deep neural networks in SOD tasks, coping with applications under limited computational resources. In this context, we have investigated lightweight networks using a methodology named Feature Learning from Image Markers (FLIM), which assumes that the encoder's kernels can be estimated from marker pixels on discriminative regions of a few representative images. This work proposes flyweight networks, hundreds of times lighter than lightweight models, for SOD by combining a FLIM encoder with an adaptive decoder, whose weights are estimated for each input image by a given heuristic function. Such FLIM networks are trained from three to four representative images only and without backpropagation, making the models suitable for applications under labeled data constraints as well. We study five adaptive decoders; two of them are introduced here. Differently from the previous ones that rely on one neuron per pixel with shared weights, the heuristic functions of the new adaptive decoders estimate the weights of each neuron per pixel. We compare FLIM models with adaptive decoders for two challenging SOD tasks with three lightweight networks from the state-of-the-art, two FLIM networks with decoders trained by backpropagation, and one FLIM network whose labeled markers define the decoder's weights. The experiments demonstrate the advantages of the proposed networks over the baselines, revealing the importance of further investigating such methods in new applications.
Related papers
- Multi-level Cellular Automata for FLIM networks [40.83004529604423]
We propose a new approach to deep-learning Salient Object Detection.<n>It combines modern and classical techniques to maintain competitive performance.<n>We show that our method is competitive with established models in the deep SOD literature.
arXiv Detail & Related papers (2025-04-15T17:22:24Z) - Flyweight FLIM Networks for Salient Object Detection in Biomedical Images [42.763966145188625]
This study presents methods to learn dilated-separable convolutional kernels and multi-dilation layers without backpropagation for FLIM networks.<n>It also proposes a novel network simplification method to reduce kernel redundancy and encoder size.
arXiv Detail & Related papers (2025-04-15T11:57:40Z) - A Lightweight U-like Network Utilizing Neural Memory Ordinary Differential Equations for Slimming the Decoder [13.123714410130912]
We propose three plug-and-play decoders by employing different discretization methods of the neural memory Ordinary Differential Equations (nmODEs)<n>These decoders integrate features at various levels of abstraction by processing information from skip connections and performing numerical operations on upward path.<n>In summary, the proposed discretized nmODEs decoders are capable of reducing the number of parameters by about 20% 50% and FLOPs by up to 74%, while possessing the potential to adapt to all U-like networks.
arXiv Detail & Related papers (2024-12-09T07:21:27Z) - Efficient Transformer Encoders for Mask2Former-style models [57.54752243522298]
ECO-M2F is a strategy to self-select the number of hidden layers in the encoder conditioned on the input image.
The proposed approach reduces expected encoder computational cost while maintaining performance.
It is flexible in architecture configurations, and can be extended beyond the segmentation task to object detection.
arXiv Detail & Related papers (2024-04-23T17:26:34Z) - A Point-Based Approach to Efficient LiDAR Multi-Task Perception [49.91741677556553]
PAttFormer is an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds.
Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for task-specific point cloud representations.
Our evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP.
arXiv Detail & Related papers (2024-04-19T11:24:34Z) - Building Flyweight FLIM-based CNNs with Adaptive Decoding for Object
Detection [40.97322222472642]
This work presents a method to build a Convolutional Neural Network (CNN) layer by layer for object detection from user-drawn markers.
We address the detection of Schistosomiasis mansoni eggs in microscopy images of fecal samples, and the detection of ships in satellite images.
Our CNN weighs thousands of times less than SOTA object detectors, being suitable for CPU execution and showing superior or equivalent performance to three methods in five measures.
arXiv Detail & Related papers (2023-06-26T16:48:20Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - LwPosr: Lightweight Efficient Fine-Grained Head Pose Estimation [2.538209532048867]
This paper presents a lightweight network for head pose estimation (HPE) task.
The proposed network textitLwPosr uses mixture of depthwise separable convolutional (DSC) and transformer encoder layers.
arXiv Detail & Related papers (2022-02-07T22:12:27Z) - Dynamic Neural Representational Decoders for High-Resolution Semantic
Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD)
As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks.
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z) - Suppress and Balance: A Simple Gated Network for Salient Object
Detection [89.88222217065858]
We propose a simple gated network (GateNet) to solve both issues at once.
With the help of multilevel gate units, the valuable context information from the encoder can be optimally transmitted to the decoder.
In addition, we adopt the atrous spatial pyramid pooling based on the proposed "Fold" operation (Fold-ASPP) to accurately localize salient objects of various scales.
arXiv Detail & Related papers (2020-07-16T02:00:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.