Related papers: AURASeg: Attention Guided Upsampling with Residual Boundary-Assistive Refinement for Drivable-Area Segmentation

AURASeg: Attention Guided Upsampling with Residual Boundary-Assistive Refinement for Drivable-Area Segmentation

URL: http://arxiv.org/abs/2510.21536v1
Date: Fri, 24 Oct 2025 15:01:18 GMT
Title: AURASeg: Attention Guided Upsampling with Residual Boundary-Assistive Refinement for Drivable-Area Segmentation
Authors: Narendhiran Vijayakumar, Sridevi. M,
Abstract summary: We propose Attention-Guided Upsampling with Residual Boundary-Assistive Refinement (AURASeg) for semantic segmentation.<n>Our approach achieves an improvement in mean Intersection-over-Union (mIoU) of +1.26% and segmentation precision of +1.65% compared to state-of-the-art models.
Score: 0.5729426778193399
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Free space ground segmentation is essential to navigate robots and autonomous vehicles, recognize drivable zones, and traverse efficiently. Fine-grained features remain challenging for existing segmentation models, particularly for robots in indoor and structured environments. These difficulties arise from ineffective multi-scale processing, suboptimal boundary refinement, and limited feature representation. In order to overcome these limitations, we propose Attention-Guided Upsampling with Residual Boundary-Assistive Refinement (AURASeg), a ground-plane semantic segmentation model that maintains high segmentation accuracy while improving border precision. Our method uses CSP-Darknet backbone by adding a Residual Border Refinement Module (RBRM) for accurate edge delineation and an Attention Progressive Upsampling Decoder (APUD) for strong feature integration. We also incorporate a lightweight Atrous Spatial Pyramid Pooling (ASPP-Lite) module to ensure multi-scale context extraction without compromising real-time performance. The proposed model beats benchmark segmentation architectures in mIoU and F1 metrics when tested on the Ground Mobile Robot Perception (GMRP) Dataset and a custom Gazebo indoor dataset. Our approach achieves an improvement in mean Intersection-over-Union (mIoU) of +1.26% and segmentation precision of +1.65% compared to state-of-the-art models. These results show that our technique is feasible for autonomous perception in both indoor and outdoor environments, enabling precise border refinement with minimal effect on inference speed.

Related papers

Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory [101.2076718776139]
We propose a robust interactive world model capable of maintaining coherent visual memory over 1000+ frames in complex real-world environments.<n>We introduce a Pose-free Memory (HPMC) that distills historical latents into a fixed-budget geometric representation.<n>We also propose an Uncertainty-aware Action Labeling module that discretizes continuous motion into a tri-state logic.
arXiv Detail & Related papers (2026-02-02T17:52:56Z)
CLIDD: Cross-Layer Independent Deformable Description for Efficient and Discriminative Local Feature Representation [6.478456907626643]
Cross-Layer Independent Deformable Description (CLIDD) is a method that achieves superior distinctiveness by sampling directly from independent feature hierarchies.<n>To ensure real-time performance, we implement a hardware-aware kernel fusion strategy.<n>We develop a scalable framework that integrates lightweight architectures with a training protocol.
arXiv Detail & Related papers (2026-01-14T07:03:01Z)
Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method [54.461213497603154]
Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities.<n>Nuplan-Occ is the largest occupancy dataset to date, constructed from the widely used Nuplan benchmark.<n>We develop a unified framework that jointly synthesizes high-quality occupancy, multi-view videos, and LiDAR point clouds.
arXiv Detail & Related papers (2025-10-27T03:52:45Z)
MOBIUS: Big-to-Mobile Universal Instance Segmentation via Multi-modal Bottleneck Fusion and Calibrated Decoder Pruning [91.90342432541138]
Scaling up model size and training data has advanced foundation models for instance-level perception.<n>High computational cost limits adoption on resource-constrained platforms.<n>We introduce a new benchmark for efficient segmentation on both high-performance computing platforms and mobile devices.
arXiv Detail & Related papers (2025-10-16T18:00:00Z)
Bridging the Synthetic-Real Gap: Supervised Domain Adaptation for Robust Spacecraft 6-DoF Pose Estimation [13.83897333268682]
Spacecraft Pose Estimation is a fundamental capability for autonomous space operations such as rendezvous, docking, and in-orbit docking.<n>Existing domain adaptation approaches aim to mitigate this issue but often underperform when a modest number of labeled target samples are available.<n>We propose the first Supervised Domain Adaptation (SDA) framework tailored for SPE keypoint regression.
arXiv Detail & Related papers (2025-09-17T08:03:05Z)
Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive [58.0729162588429]
Interactive segmentation improves annotation efficiency by segmenting target regions from user prompts.<n>Current approaches face a critical trade-off: dense-token methods achieve superior accuracy but suffer from prohibitively slow processing on CPU devices.<n>We propose Inter2Former to address this challenge by optimizing computation allocation in dense-token processing.
arXiv Detail & Related papers (2025-07-13T12:33:37Z)
VRS-UIE: Value-Driven Reordering Scanning for Underwater Image Enhancement [104.78586859995333]
State Space Models (SSMs) have emerged as a promising backbone for vision tasks due to their linear complexity and global receptive field.<n>The predominance of large-portion, homogeneous but useless oceanic backgrounds can dilute the feature representation responses of sparse yet valuable targets.<n>We propose a novel Value-Driven Reordering Scanning framework for Underwater Image Enhancement (UIE)<n>Our framework sets a new state-of-the-art, delivering superior enhancement performance (surpassing WMamba by 0.89 dB on average) by effectively suppressing water bias and preserving structural and color fidelity.
arXiv Detail & Related papers (2025-05-02T12:21:44Z)
A Deep Learning Framework for Boundary-Aware Semantic Segmentation [9.680285420002516]
This study proposes a Mask2Former-based semantic segmentation algorithm incorporating a boundary enhancement feature bridging module (BEFBM)<n>The proposed approach achieves significant improvements in metrics such as mIOU, mDICE, and mRecall.<n>Visual analysis confirms the model's advantages in fine-grained regions.
arXiv Detail & Related papers (2025-03-28T00:00:08Z)
Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing [53.295515505026096]
Janus-Pro-driven Prompt Parsing is a prompt- parsing module that bridges text understanding and layout generation.<n>MIGLoRA is a parameter-efficient plug-in integrating Low-Rank Adaptation into UNet (SD1.5) and DiT (SD3) backbones.<n>The proposed method achieves state-of-the-art performance on COCO and LVIS benchmarks while maintaining parameter efficiency.
arXiv Detail & Related papers (2025-03-27T00:59:14Z)
Semantic Segmentation and Scene Reconstruction of RGB-D Image Frames: An End-to-End Modular Pipeline for Robotic Applications [0.7951977175758216]
Traditional RGB-D processing pipelines focus primarily on geometric reconstruction.<n>We introduce a novel end-to-end modular pipeline that integrates semantic segmentation, human tracking, point-cloud fusion, and scene reconstruction.<n>We validate our approach on benchmark datasets and real-world Kinect RGB-D data, demonstrating improved efficiency, accuracy, and usability.
arXiv Detail & Related papers (2024-10-23T16:01:31Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)
Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis. We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z)
Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots [17.90723909170376]
We introduce Mobile-Seed, a lightweight framework for simultaneous semantic segmentation and boundary detection. Our framework features a two-stream encoder, an active fusion decoder (AFD) and a dual-task regularization approach. Experiments on the Cityscapes dataset have shown that Mobile-Seed achieves notable improvement over the state-of-the-art (SOTA) baseline.
arXiv Detail & Related papers (2023-11-21T14:53:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.