The revenge of BiSeNet: Efficient Multi-Task Image Segmentation
- URL: http://arxiv.org/abs/2404.09570v1
- Date: Mon, 15 Apr 2024 08:32:18 GMT
- Title: The revenge of BiSeNet: Efficient Multi-Task Image Segmentation
- Authors: Gabriele Rosi, Claudia Cuttano, Niccolò Cavagnero, Giuseppe Averta, Fabio Cermelli,
- Abstract summary: BiSeNetFormer is a novel architecture for efficient multi-task image segmentation.
By seamlessly supporting multiple tasks, BiSeNetFormer offers a versatile solution for multi-task segmentation.
Our results indicate that BiSeNetFormer represents a significant advancement towards fast, efficient, and multi-task segmentation networks.
- Score: 6.172605433695617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in image segmentation have focused on enhancing the efficiency of the models to meet the demands of real-time applications, especially on edge devices. However, existing research has primarily concentrated on single-task settings, especially on semantic segmentation, leading to redundant efforts and specialized architectures for different tasks. To address this limitation, we propose a novel architecture for efficient multi-task image segmentation, capable of handling various segmentation tasks without sacrificing efficiency or accuracy. We introduce BiSeNetFormer, that leverages the efficiency of two-stream semantic segmentation architectures and it extends them into a mask classification framework. Our approach maintains the efficient spatial and context paths to capture detailed and semantic information, respectively, while leveraging an efficient transformed-based segmentation head that computes the binary masks and class probabilities. By seamlessly supporting multiple tasks, namely semantic and panoptic segmentation, BiSeNetFormer offers a versatile solution for multi-task segmentation. We evaluate our approach on popular datasets, Cityscapes and ADE20K, demonstrating impressive inference speeds while maintaining competitive accuracy compared to state-of-the-art architectures. Our results indicate that BiSeNetFormer represents a significant advancement towards fast, efficient, and multi-task segmentation networks, bridging the gap between model efficiency and task adaptability.
Related papers
- TraceNet: Segment one thing efficiently [12.621208412232733]
We propose a one tap driven single instance segmentation task that segments a single instance selected by a user via a positive tap.
We present TraceNet, which explicitly locates the selected instance by way of receptive field tracing.
We evaluate the performance of TraceNet on instance IoU average over taps and the proportion of the region that a user tap can fall into for a high-quality single-instance mask.
arXiv Detail & Related papers (2024-06-21T05:46:46Z) - PEM: Prototype-based Efficient MaskFormer for Image Segmentation [10.795762739721294]
Recent transformer-based architectures have shown impressive results in the field of image segmentation.
We propose Prototype-based Efficient MaskFormer (PEM), an efficient transformer-based architecture that can operate in multiple segmentation tasks.
arXiv Detail & Related papers (2024-02-29T18:21:54Z) - Multi-interactive Feature Learning and a Full-time Multi-modality
Benchmark for Image Fusion and Segmentation [66.15246197473897]
Multi-modality image fusion and segmentation play a vital role in autonomous driving and robotic operation.
We propose a textbfMulti-textbfinteractive textbfFeature learning architecture for image fusion and textbfSegmentation.
arXiv Detail & Related papers (2023-08-04T01:03:58Z) - Masked Supervised Learning for Semantic Segmentation [5.177947445379688]
Masked Supervised Learning (MaskSup) is an effective single-stage learning paradigm that models both short- and long-range context.
We show that the proposed method is computationally efficient, yielding an improved performance by 10% on the mean intersection-over-union (mIoU)
arXiv Detail & Related papers (2022-10-03T13:30:19Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised
Semantic Segmentation [88.49669148290306]
We propose a novel weakly supervised multi-task framework called AuxSegNet to leverage saliency detection and multi-label image classification as auxiliary tasks.
Inspired by their similar structured semantics, we also propose to learn a cross-task global pixel-level affinity map from the saliency and segmentation representations.
The learned cross-task affinity can be used to refine saliency predictions and propagate CAM maps to provide improved pseudo labels for both tasks.
arXiv Detail & Related papers (2021-07-25T11:39:58Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z) - Multi-task GANs for Semantic Segmentation and Depth Completion with
Cycle Consistency [7.273142068778457]
We propose multi-task generative adversarial networks (Multi-task GANs), which are competent in semantic segmentation and depth completion.
In this paper, we improve the details of generated semantic images based on CycleGAN by introducing multi-scale spatial pooling blocks and the structural similarity reconstruction loss.
Experiments on Cityscapes dataset and KITTI depth completion benchmark show that the Multi-task GANs are capable of achieving competitive performance.
arXiv Detail & Related papers (2020-11-29T04:12:16Z) - BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time
Semantic Segmentation [118.46210049742993]
We propose an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral spatial Network (BiSeNet V2)
For a 2,048x1, input, we achieve 72.6% Mean IoU on the Cityscapes test set with a speed of 156 FPS on one NVIDIA GeForce 1080 Ti card, which is significantly faster than existing methods, yet we achieve better segmentation accuracy.
arXiv Detail & Related papers (2020-04-05T10:26:38Z) - CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images.
Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z) - EPSNet: Efficient Panoptic Segmentation Network with Cross-layer
Attention Fusion [5.815742965809424]
We propose an Efficient Panoptic Network (EPSNet) to tackle the panoptic segmentation tasks with fast inference speed.
Basically, EPSNet generates masks based on simple linear combination of prototype masks and mask coefficients.
To enhance the quality of shared prototypes, we adopt a module called "cross-layer attention fusion module"
arXiv Detail & Related papers (2020-03-23T09:11:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.