Related papers: JetSeg: Efficient Real-Time Semantic Segmentation Model for Low-Power GPU-Embedded Systems

JetSeg: Efficient Real-Time Semantic Segmentation Model for Low-Power GPU-Embedded Systems

URL: http://arxiv.org/abs/2305.11419v1
Date: Fri, 19 May 2023 04:07:26 GMT
Title: JetSeg: Efficient Real-Time Semantic Segmentation Model for Low-Power GPU-Embedded Systems
Authors: Miguel Lopez-Montiel, Daniel Alejandro Lopez, Oscar Montiel
Abstract summary: We propose an efficient model for real-time semantic segmentation called JetSeg. JetSeg consists of an encoder called JetNet, and an improved RegSeg decoder. Our approach outperforms state-of-the-art real-time encoder-decoder models by reducing 46.70M parameters and 5.14% GFLOPs.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Real-time semantic segmentation is a challenging task that requires high-accuracy models with low-inference times. Implementing these models on embedded systems is limited by hardware capability and memory usage, which produces bottlenecks. We propose an efficient model for real-time semantic segmentation called JetSeg, consisting of an encoder called JetNet, and an improved RegSeg decoder. The JetNet is designed for GPU-Embedded Systems and includes two main components: a new light-weight efficient block called JetBlock, that reduces the number of parameters minimizing memory usage and inference time without sacrificing accuracy; a new strategy that involves the combination of asymmetric and non-asymmetric convolutions with depthwise-dilated convolutions called JetConv, a channel shuffle operation, light-weight activation functions, and a convenient number of group convolutions for embedded systems, and an innovative loss function named JetLoss, which integrates the Precision, Recall, and IoUB losses to improve semantic segmentation and reduce computational complexity. Experiments demonstrate that JetSeg is much faster on workstation devices and more suitable for Low-Power GPU-Embedded Systems than existing state-of-the-art models for real-time semantic segmentation. Our approach outperforms state-of-the-art real-time encoder-decoder models by reducing 46.70M parameters and 5.14% GFLOPs, which makes JetSeg up to 2x faster on the NVIDIA Titan RTX GPU and the Jetson Xavier than other models. The JetSeg code is available at https://github.com/mmontielpz/jetseg.

Related papers

LowFormer: Hardware Efficient Design for Convolutional Transformer Backbones [10.435069781620957]
Research in efficient vision backbones is evolving into models that are a mixture of convolutions and transformer blocks. We analyze common modules and architectural design choices for backbones not in terms of MACs, but rather in actual throughput and latency. We combine both macro and micro design to create a new family of hardware-efficient backbone networks called LowFormer.
arXiv Detail & Related papers (2024-09-05T12:18:32Z)
Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators [0.0]
Deep Neural Networks (DNNs) are being developed, trained, and utilized, putting a strain on both advanced and limited devices. Our solution is to implement em weight block sparsity, which is a structured sparsity that is friendly to hardware. We will present performance estimates using accurate and complete code generation for AIE2 configuration sets (AMD Versal FPGAs) with Resnet50, Inception V3, and VGG16.
arXiv Detail & Related papers (2024-07-12T17:37:49Z)
EdgeYOLO: An Edge-Real-Time Object Detector [69.41688769991482]
This paper proposes an efficient, low-complexity and anchor-free object detector based on the state-of-the-art YOLO framework. We develop an enhanced data augmentation method to effectively suppress overfitting during training, and design a hybrid random loss function to improve the detection accuracy of small objects. Our baseline model can reach the accuracy of 50.6% AP50:95 and 69.8% AP50 in MS 2017 dataset, 26.4% AP50:95 and 44.8% AP50 in VisDrone 2019-DET dataset, and it meets real-time requirements (FPS>=30) on edge-computing device Nvidia
arXiv Detail & Related papers (2023-02-15T06:05:14Z)
RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer [63.25665813125223]
We propose RTFormer, an efficient dual-resolution transformer for real-time semantic segmenation. It achieves better trade-off between performance and efficiency than CNN-based models. Experiments on mainstream benchmarks demonstrate the effectiveness of our proposed RTFormer.
arXiv Detail & Related papers (2022-10-13T16:03:53Z)
Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks. The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources. This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z)
Particle Transformer for Jet Tagging [4.604003661048267]
We present JetClass, a new comprehensive dataset for jet tagging. The dataset consists of 100 M jets, about two orders of magnitude larger than existing public datasets. We propose a new Transformer-based architecture for jet tagging, called Particle Transformer (ParT)
arXiv Detail & Related papers (2022-02-08T10:36:29Z)
Edge Federated Learning Via Unit-Modulus Over-The-Air Computation (Extended Version) [64.76619508293966]
This paper proposes a unit-modulus over-the-air computation (UM-AirComp) framework to facilitate efficient edge federated learning. It uploads simultaneously local model parameters and updates global model parameters via analog beamforming. We demonstrate the implementation of UM-AirComp in a vehicle-to-everything autonomous driving simulation platform.
arXiv Detail & Related papers (2021-01-28T15:10:22Z)
GhostSR: Learning Ghost Features for Efficient Image Super-Resolution [49.393251361038025]
Single image super-resolution (SISR) system based on convolutional neural networks (CNNs) achieves fancy performance while requires huge computational costs. We propose to use shift operation to generate the redundant features (i.e., Ghost features) of SISR models. We show that both the non-compact and lightweight SISR models embedded in our proposed module can achieve comparable performance to that of their baselines.
arXiv Detail & Related papers (2021-01-21T10:09:47Z)
HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation [95.47168925127089]
We present a novel, real-time, semantic segmentation network in which the encoder both encodes and generates the parameters (weights) of the decoder. We design a new type of hypernetwork, composed of a nested U-Net for drawing higher level context features.
arXiv Detail & Related papers (2020-12-21T18:58:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.