Related papers: AASeg: Attention Aware Network for Real Time Semantic Segmentation

Related papers

LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance [56.474856189865946]
Large multi-modal models (LMMs) struggle with inaccurate segmentation and hallucinated comprehension.<n>We propose LIRA, a framework that capitalizes on the complementary relationship between visual comprehension and segmentation.<n>LIRA achieves state-of-the-art performance in both segmentation and comprehension tasks.
arXiv Detail & Related papers (2025-07-08T07:46:26Z)
Pay Attention to What and Where? Interpretable Feature Extractor in Vision-based Deep Reinforcement Learning [2.713322720372114]
Current approaches in Explainable Deep Reinforcement Learning have limitations in which the attention mask has a displacement with the objects in visual input.<n>We propose the Interpretable Feature Extractor architecture, aimed at generating an accurate attention mask to illustrate both "what" and "where" the agent concentrates on in the spatial domain.<n>The resulting attention mask is consistent, highly understandable by humans, accurate in spatial dimension, and effectively highlights important objects or locations in visual input.
arXiv Detail & Related papers (2025-04-14T10:18:34Z)
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond [52.486290612938895]
We propose a novel method that leverages the semantic knowledge from the Segment Anything Model (SAM) to Grow the quality of fusion results and Enable downstream task adaptability.<n> Specifically, we design a Semantic Persistent Attention (SPA) Module that efficiently maintains source information via the persistent repository while extracting high-level semantic priors from SAM.<n>Our method achieves a balance between high-quality visual results and downstream task adaptability while maintaining practical deployment efficiency.
arXiv Detail & Related papers (2025-03-03T06:16:31Z)
ContextFormer: Redefining Efficiency in Semantic Segmentation [48.81126061219231]
Convolutional methods, although capturing local dependencies well, struggle with long-range relationships.<n>Vision Transformers (ViTs) excel in global context capture but are hindered by high computational demands.<n>We propose ContextFormer, a hybrid framework leveraging the strengths of CNNs and ViTs in the bottleneck to balance efficiency, accuracy, and robustness for real-time semantic segmentation.
arXiv Detail & Related papers (2025-01-31T16:11:04Z)
A Deep Semantic Segmentation Network with Semantic and Contextual Refinements [11.755865577258767]
This paper introduces a Semantic Refinement Module (SRM) to address this issue within the segmentation network. A Contextual Refinement Module (CRM) is presented to capture global context information across both spatial and channel dimensions. The efficacy of these proposed modules is validated on three widely used datasets-Cityscapes, Bdd100K, and ADE20K-demonstrating superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-12-11T03:40:46Z)
Evaluating the Efficacy of Cut-and-Paste Data Augmentation in Semantic Segmentation for Satellite Imagery [4.499833362998487]
This study explores the effectiveness of a Cut-and-Paste augmentation technique for semantic segmentation in satellite images.<n>We adapt this augmentation, which usually requires labeled instances, to the case of semantic segmentation.<n>Using the DynamicEarthNet dataset and a U-Net model for evaluation, we found that this augmentation significantly enhances the mIoU score on the test set from 37.9 to 44.1.
arXiv Detail & Related papers (2024-04-08T17:18:30Z)
Segment Anything Model is a Good Teacher for Local Feature Learning [19.66262816561457]
Local feature detection and description play an important role in many computer vision tasks. Data-driven local feature learning methods need to rely on pixel-level correspondence for training. We propose SAMFeat to introduce SAM as a teacher to guide local feature learning.
arXiv Detail & Related papers (2023-09-29T05:29:20Z)
Cross-CBAM: A Lightweight network for Scene Segmentation [2.064612766965483]
We present the Cross-CBAM network, a novel lightweight network for real-time semantic segmentation. In experiments on the Cityscapes dataset and Camvid dataset, we achieve 73.4% mIoU with a speed of 240.9FPS and 77.2% mIoU with a speed of 88.6FPS on NVIDIA GTX 1080Ti.
arXiv Detail & Related papers (2023-06-04T09:03:05Z)
A Self-Training Framework Based on Multi-Scale Attention Fusion for Weakly Supervised Semantic Segmentation [7.36778096476552]
We propose a self-training method that utilizes fused multi-scale class-aware attention maps. We collect information from attention maps of different scales and obtain multi-scale attention maps. We then apply denoising and reactivation strategies to enhance the potential regions and reduce noisy areas.
arXiv Detail & Related papers (2023-05-10T02:16:12Z)
De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects. We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding. We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z)
A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented, Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles. Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z)
Deep Multi-Branch Aggregation Network for Real-Time Semantic Segmentation in Street Scenes [32.54045305607654]
Many state-of-the-art real-time semantic segmentation methods tend to sacrifice some spatial details or contextual information for fast inference. We propose a novel Deep Multi-branch Aggregation Network (called DMA-Net) based on the encoder-decoder structure to perform real-time semantic segmentation in street scenes. Our proposed DMA-Net respectively obtains 77.0% and 73.6% mean Intersection over Union (mIoU) at the inference speed of 46.7 FPS and 119.8 FPS by only using a single NVIDIA GTX 1080Ti GPU.
arXiv Detail & Related papers (2022-03-08T12:07:32Z)
Residual Moment Loss for Medical Image Segmentation [56.72261489147506]
Location information is proven to benefit the deep learning models on capturing the manifold structure of target objects. Most existing methods encode the location information in an implicit way, for the network to learn. We propose a novel loss function, namely residual moment (RM) loss, to explicitly embed the location information of segmentation targets.
arXiv Detail & Related papers (2021-06-27T09:31:49Z)
Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation [95.74244714914052]
Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich-temporal information online. PCAN outperforms current video instance tracking and segmentation competition winners on Youtube-VIS and BDD100K datasets.
arXiv Detail & Related papers (2021-06-22T17:57:24Z)
CTNet: Context-based Tandem Network for Semantic Segmentation [77.4337867789772]
This work proposes a novel Context-based Tandem Network (CTNet) by interactively exploring the spatial contextual information and the channel contextual information. To further improve the performance of the learned representations for semantic segmentation, the results of the two context modules are adaptively integrated.
arXiv Detail & Related papers (2021-04-20T07:33:11Z)
STEP: Segmenting and Tracking Every Pixel [107.23184053133636]
We present a new benchmark: Segmenting and Tracking Every Pixel (STEP) Our work is the first that targets this task in a real-world setting that requires dense interpretation in both spatial and temporal domains. For measuring the performance, we propose a novel evaluation metric and Tracking Quality (STQ)
arXiv Detail & Related papers (2021-02-23T18:43:02Z)
Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time. The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism. We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z)
Semantic Segmentation With Multi Scale Spatial Attention For Self Driving Cars [2.7317088388886384]
We present a novel neural network using multi scale feature fusion at various scales for accurate and efficient semantic image segmentation. We used ResNet based feature extractor, dilated convolutional layers in downsampling part, atrous convolutional layers in the upsampling part and used concat operation to merge them. A new attention module is proposed to encode more contextual information and enhance the receptive field of the network.
arXiv Detail & Related papers (2020-06-30T20:19:09Z)
Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes. The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
Hybrid Multiple Attention Network for Semantic Segmentation in Aerial Images [24.35779077001839]
We propose a novel attention-based framework named Hybrid Multiple Attention Network (HMANet) to adaptively capture global correlations. We introduce a simple yet effective region shuffle attention (RSA) module to reduce feature redundant and improve the efficiency of self-attention mechanism.
arXiv Detail & Related papers (2020-01-09T07:47:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.