PIDNet: A Real-time Semantic Segmentation Network Inspired by PID
Controllers
- URL: http://arxiv.org/abs/2206.02066v3
- Date: Fri, 7 Apr 2023 01:10:17 GMT
- Title: PIDNet: A Real-time Semantic Segmentation Network Inspired by PID
Controllers
- Authors: Jiacong Xu, Zixiang Xiong and Shankar P. Bhattacharyya
- Abstract summary: Two-branch network architecture has shown its efficiency and effectiveness in real-time semantic segmentation tasks.
We propose a novel three-branch network architecture: PIDNet, which contains three branches to parse detailed, context and boundary information.
Our family of PIDNets achieve the best trade-off between inference speed and accuracy and their accuracy surpasses all the existing models with similar inference speed on the Cityscapes and CamVid datasets.
- Score: 6.0653144230649865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Two-branch network architecture has shown its efficiency and effectiveness in
real-time semantic segmentation tasks. However, direct fusion of
high-resolution details and low-frequency context has the drawback of detailed
features being easily overwhelmed by surrounding contextual information. This
overshoot phenomenon limits the improvement of the segmentation accuracy of
existing two-branch models. In this paper, we make a connection between
Convolutional Neural Networks (CNN) and Proportional-Integral-Derivative (PID)
controllers and reveal that a two-branch network is equivalent to a
Proportional-Integral (PI) controller, which inherently suffers from similar
overshoot issues. To alleviate this problem, we propose a novel three-branch
network architecture: PIDNet, which contains three branches to parse detailed,
context and boundary information, respectively, and employs boundary attention
to guide the fusion of detailed and context branches. Our family of PIDNets
achieve the best trade-off between inference speed and accuracy and their
accuracy surpasses all the existing models with similar inference speed on the
Cityscapes and CamVid datasets. Specifically, PIDNet-S achieves 78.6% mIOU with
inference speed of 93.2 FPS on Cityscapes and 80.1% mIOU with speed of 153.7
FPS on CamVid.
Related papers
- A Point-Based Approach to Efficient LiDAR Multi-Task Perception [49.91741677556553]
PAttFormer is an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds.
Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for task-specific point cloud representations.
Our evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP.
arXiv Detail & Related papers (2024-04-19T11:24:34Z) - Rethinking Lightweight Salient Object Detection via Network Depth-Width
Tradeoff [26.566339984225756]
Existing salient object detection methods often adopt deeper and wider networks for better performance.
We propose a novel trilateral decoder framework by decoupling the U-shape structure into three complementary branches.
We show that our method achieves better efficiency-accuracy balance across five benchmarks.
arXiv Detail & Related papers (2023-01-17T03:43:25Z) - Lightweight and Progressively-Scalable Networks for Semantic
Segmentation [100.63114424262234]
Multi-scale learning frameworks have been regarded as a capable class of models to boost semantic segmentation.
In this paper, we thoroughly analyze the design of convolutional blocks and the ways of interactions across multiple scales.
We devise Lightweight and Progressively-Scalable Networks (LPS-Net) that novelly expands the network complexity in a greedy manner.
arXiv Detail & Related papers (2022-07-27T16:00:28Z) - FBSNet: A Fast Bilateral Symmetrical Network for Real-Time Semantic
Segmentation [23.25851281719734]
We propose a Fast Bilateral Symmetrical Network (FBSNet) for real-time semantic segmentation.
FBSNet employs a symmetrical-decoder structure with two branches, semantic information branch, and spatial detail branch.
The experimental results of Cityscapes and CamVid show that the proposed FBSNet can strike a good balance between accuracy and efficiency.
arXiv Detail & Related papers (2021-09-02T04:16:39Z) - MSCFNet: A Lightweight Network With Multi-Scale Context Fusion for
Real-Time Semantic Segmentation [27.232578592161673]
We devise a novel lightweight network using a multi-scale context fusion scheme (MSCFNet)
The proposed MSCFNet contains only 1.15M parameters, achieves 71.9% Mean IoU and can run at over 50 FPS on a single Titan XP GPU configuration.
arXiv Detail & Related papers (2021-03-24T08:28:26Z) - LiDAR-based Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
LiDAR-based panoptic segmentation aims to parse both objects and scenes in a unified manner.
We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm.
Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods.
arXiv Detail & Related papers (2020-11-24T08:44:46Z) - Real-time Semantic Segmentation with Context Aggregation Network [14.560708848716754]
We propose a dual branch convolutional neural network, with significantly lower computational costs as compared to the state-of-the-art.
We evaluate our method on two semantic segmentation datasets, namely Cityscapes dataset and UAVid dataset.
arXiv Detail & Related papers (2020-11-02T14:16:23Z) - Dense Dual-Path Network for Real-time Semantic Segmentation [7.8381744043673045]
We introduce a novel Dual-Path Network (DDPNet) for real-time semantic segmentation under resource constraints.
DDPNet achieves 75.3% mIoU with 52.6 FPS for an input of 1024 X 2048 resolution on a single GTX 1080Ti card.
arXiv Detail & Related papers (2020-10-21T06:11:41Z) - BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time
Semantic Segmentation [118.46210049742993]
We propose an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral spatial Network (BiSeNet V2)
For a 2,048x1, input, we achieve 72.6% Mean IoU on the Cityscapes test set with a speed of 156 FPS on one NVIDIA GeForce 1080 Ti card, which is significantly faster than existing methods, yet we achieve better segmentation accuracy.
arXiv Detail & Related papers (2020-04-05T10:26:38Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z) - Toward fast and accurate human pose estimation via soft-gated skip
connections [97.06882200076096]
This paper is on highly accurate and highly efficient human pose estimation.
We re-analyze this design choice in the context of improving both the accuracy and the efficiency over the state-of-the-art.
Our model achieves state-of-the-art results on the MPII and LSP datasets.
arXiv Detail & Related papers (2020-02-25T18:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.