ASAP: Accurate semantic segmentation for real time performance
- URL: http://arxiv.org/abs/2210.01323v1
- Date: Tue, 4 Oct 2022 02:35:53 GMT
- Title: ASAP: Accurate semantic segmentation for real time performance
- Authors: Jaehyun Park, Subin Lee, Eon Kim, Byeongjun Moon, Dabeen Yu, Yeonseung
Yu, Junghwan Kim
- Abstract summary: We propose an efficient feature fusion method, Feature Fusion with Different Norms (FFDN)
FFDN utilizes rich global context of multi-level scale and vertical pooling module before self-attention.
We achieve the mean Interaction of-union(mIoU) of 73.1 and the Frame Per Second(FPS) of 191, which are comparable results with state-of-the-arts on Cityscapes test datasets.
- Score: 3.5327983932835165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature fusion modules from encoder and self-attention module have been
adopted in semantic segmentation. However, the computation of these modules is
costly and has operational limitations in real-time environments. In addition,
segmentation performance is limited in autonomous driving environments with a
lot of contextual information perpendicular to the road surface, such as
people, buildings, and general objects. In this paper, we propose an efficient
feature fusion method, Feature Fusion with Different Norms (FFDN) that utilizes
rich global context of multi-level scale and vertical pooling module before
self-attention that preserves most contextual information while reducing the
complexity of global context encoding in the vertical direction. By doing this,
we could handle the properties of representation in global space and reduce
additional computational cost. In addition, we analyze low performance in
challenging cases including small and vertically featured objects. We achieve
the mean Interaction of-union(mIoU) of 73.1 and the Frame Per Second(FPS) of
191, which are comparable results with state-of-the-arts on Cityscapes test
datasets.
Related papers
- Learning Spatial-Semantic Features for Robust Video Object Segmentation [108.045326229865]
We propose a robust video object segmentation framework equipped with spatial-semantic features and discriminative object queries.
We show that the proposed method set a new state-of-the-art performance on multiple datasets.
arXiv Detail & Related papers (2024-07-10T15:36:00Z) - Spatial-Temporal Multi-level Association for Video Object Segmentation [89.32226483171047]
This paper proposes spatial-temporal multi-level association, which jointly associates reference frame, test frame, and object features.
Specifically, we construct a spatial-temporal multi-level feature association module to learn better target-aware features.
arXiv Detail & Related papers (2024-04-09T12:44:34Z) - DistFormer: Enhancing Local and Global Features for Monocular Per-Object
Distance Estimation [35.6022448037063]
Per-object distance estimation is crucial in safety-critical applications such as autonomous driving, surveillance, and robotics.
Existing approaches rely on two scales: local information (i.e., the bounding box proportions) or global information.
Our work aims to strengthen both local and global cues.
arXiv Detail & Related papers (2024-01-06T10:56:36Z) - Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation [76.68301884987348]
We propose a simple yet effective approach for self-supervised video object segmentation (VOS)
Our key insight is that the inherent structural dependencies present in DINO-pretrained Transformers can be leveraged to establish robust-temporal segmentation correspondences in videos.
Our method demonstrates state-of-the-art performance across multiple unsupervised VOS benchmarks and excels in complex real-world multi-object video segmentation tasks.
arXiv Detail & Related papers (2023-11-29T18:47:17Z) - On Efficient Real-Time Semantic Segmentation: A Survey [12.404169549562523]
We take a look at the works that aim to address this misalignment with more compact and efficient models capable of deployment on low-memory embedded systems.
We evaluate the inference speed of the discussed models under consistent hardware and software setups.
Our experimental results demonstrate that many works are capable of real-time performance on resource-constrained hardware, while illustrating the consistent trade-off between latency and accuracy.
arXiv Detail & Related papers (2022-06-17T08:00:27Z) - Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene
Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance.
We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset.
Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z) - A Unified Efficient Pyramid Transformer for Semantic Segmentation [40.20512714144266]
We advocate a unified framework(UN-EPT) to segment objects by considering both context information and boundary artifacts.
We first adapt a sparse sampling strategy to incorporate the transformer-based attention mechanism for efficient context modeling.
We demonstrate promising performance on three popular benchmarks for semantic segmentation with low memory footprint.
arXiv Detail & Related papers (2021-07-29T17:47:32Z) - AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing [12.409365458889082]
We propose a new model, called Attention-Augmented Network (AttaNet), to capture both global context and multilevel semantics.
AttaNet consists of two primary modules: Strip Attention Module (SAM) and Attention Fusion Module (AFM)
arXiv Detail & Related papers (2021-03-10T08:38:29Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z) - Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time.
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism.
We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.