Terrain-Enhanced Resolution-aware Refinement Attention for Off-Road Segmentation
- URL: http://arxiv.org/abs/2511.01434v1
- Date: Mon, 03 Nov 2025 10:36:57 GMT
- Title: Terrain-Enhanced Resolution-aware Refinement Attention for Off-Road Segmentation
- Authors: Seongkyu Choi, Jhonghyun An,
- Abstract summary: Designs that fuse only at low resolution blur edges and propagate local errors.<n>We introduce a resolutionaware token decoder that balances global semantics, local consistency, and boundary fidelity under imperfect supervision.
- Score: 0.7734726150561086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Off-road semantic segmentation suffers from thick, inconsistent boundaries, sparse supervision for rare classes, and pervasive label noise. Designs that fuse only at low resolution blur edges and propagate local errors, whereas maintaining high-resolution pathways or repeating high-resolution fusions is costly and fragile to noise. We introduce a resolutionaware token decoder that balances global semantics, local consistency, and boundary fidelity under imperfect supervision. Most computation occurs at a low-resolution bottleneck; a gated cross-attention injects fine-scale detail, and only a sparse, uncertainty-selected set of pixels is refined. The components are co-designed and tightly integrated: global self-attention with lightweight dilated depthwise refinement restores local coherence; a gated cross-attention integrates fine-scale features from a standard high-resolution encoder stream without amplifying noise; and a class-aware point refinement corrects residual ambiguities with negligible overhead. During training, we add a boundary-band consistency regularizer that encourages coherent predictions in a thin neighborhood around annotated edges, with no inference-time cost. Overall, the results indicate competitive performance and improved stability across transitions.
Related papers
- Generalizing GNNs with Tokenized Mixture of Experts [75.8310720413187]
We show that improving stability requires reducing reliance on shift-sensitive features, leaving an irreducible worst-case generalization floor.<n>We propose STEM-GNN, a pretrain-then-finetune framework with a mixture-of-experts encoder for diverse computation paths.<n>Across nine node, link, and graph benchmarks, STEM-GNN achieves a stronger three-way balance, improving robustness to degree/homophily shifts and to feature/edge corruptions while remaining competitive on clean graphs.
arXiv Detail & Related papers (2026-02-09T22:48:30Z) - CroBIM-U: Uncertainty-Driven Referring Remote Sensing Image Segmentation [8.834663340762562]
Referring remote sensing image segmentation aims to localize specific targets described by natural language within complex overhead imagery.<n>Existing methods typically employ uniform fusion and refinement strategies across the entire image.<n>We propose an textbfuncertainty-guided framework that explicitly leverages a pixel-wise Referrbfreferring uncertainty map as a spatial prior to orchestrate adaptive inference.
arXiv Detail & Related papers (2026-01-07T01:02:39Z) - A Dual-Branch Local-Global Framework for Cross-Resolution Land Cover Mapping [16.429154404656412]
Cross-resolution land cover mapping aims to produce high-resolution semantic predictions from coarse or low-resolution supervision.<n>Existing weakly supervised approaches often struggle to align fine-grained spatial structures with coarse labels.<n>We propose DDTM, a dual-branch weakly supervised framework that explicitly decouples local semantic refinement from global contextual reasoning.
arXiv Detail & Related papers (2025-12-23T02:32:02Z) - UAGLNet: Uncertainty-Aggregated Global-Local Fusion Network with Cooperative CNN-Transformer for Building Extraction [83.48950950780554]
Building extraction from remote sensing images is a challenging task due to the complex structure variations of buildings.<n>Existing methods employ convolutional or self-attention blocks to capture the multi-scale features in the segmentation models.<n>We present an Uncertainty-Aggregated Global-Local Fusion Network (UAGLNet) to exploit high-quality global-local visual semantics.
arXiv Detail & Related papers (2025-12-15T02:59:16Z) - MetaDCSeg: Robust Medical Image Segmentation via Meta Dynamic Center Weighting [77.31583168890633]
Medical image segmentation is crucial for clinical applications, but it is frequently disrupted by noisy annotations and ambiguous anatomical boundaries.<n>We propose MetaDCSeg, a robust framework that learns optimal pixel-wise weights to suppress the influence of noisy ground-truth labels.<n>Our approach utilizes weighted feature distances for foreground, background, and boundary centers, directing the model's attention toward hard-to-segment pixels near ambiguous boundaries.
arXiv Detail & Related papers (2025-11-24T08:51:02Z) - Scale-DiT: Ultra-High-Resolution Image Generation with Hierarchical Local Attention [50.391914489898774]
Scale-DiT is a new diffusion framework that introduces hierarchical local attention with low-resolution global guidance.<n>A lightweight LoRA adaptation bridges global and local pathways during denoising, ensuring consistency across structure and detail.<n>Experiments demonstrate that Scale-DiT achieves more than $2times$ faster inference and lower memory usage.
arXiv Detail & Related papers (2025-10-18T03:15:26Z) - DENet: Dual-Path Edge Network with Global-Local Attention for Infrared Small Target Detection [5.672707725914493]
Infrared small target detection is crucial for remote sensing applications like disaster warning and maritime surveillance.<n>A fundamental challenge in designing deep models for this task lies in the inherent conflict between capturing high-resolution spatial details for minute targets and extracting robust semantic context for larger targets.<n>Existing methods often rely on fixed gradient operators or simplistic attention mechanisms, which are inadequate for accurately extracting target edges under low contrast and high noise.<n>We propose a novel Dual-Path Edge Network that explicitly addresses this challenge by decoupling edge enhancement and semantic modeling into two complementary processing paths.
arXiv Detail & Related papers (2025-09-25T03:08:26Z) - Edge-Aware Normalized Attention for Efficient and Detail-Preserving Single Image Super-Resolution [27.3322913419539]
Single-image super-resolution (SISR) remains highly ill-posed because recovering structurally faithful high-frequency content from a single low-resolution observation is ambiguous.<n>Existing edge-aware methods often attach edge priors or attention branches onto increasingly complex backbones, yet ad hoc fusion frequently introduces redundancy, unstable optimization, or limited structural gains.<n>We address this gap with an edge-guided attention mechanism that derives an adaptive modulation map from jointly encoded edge features and intermediate feature activations, then applies it to normalize and reweight responses, selectively amplifying structurally salient regions while suppressing spurious textures.
arXiv Detail & Related papers (2025-09-18T02:31:24Z) - TinyDef-DETR: A Transformer-Based Framework for Defect Detection in Transmission Lines from UAV Imagery [12.48571944931548]
TinyDef-DETR is a framework designed to achieve accurate and efficient detection of transmission line defects from UAV-acquired images.<n>The model integrates four major components: an edge-enhanced ResNet backbone to strengthen boundary-sensitive representations, a stride-free space-to-depth module to enable detail-preserving downsampling, and a Focaler-Wise-SIoU regression loss to improve the localization of small and difficult objects.
arXiv Detail & Related papers (2025-09-07T12:36:33Z) - Progressive Feature Self-reinforcement for Weakly Supervised Semantic
Segmentation [55.69128107473125]
We propose a single-stage approach for Weakly Supervised Semantic (WSSS) with image-level labels.
We adaptively partition the image content into deterministic regions (e.g., confident foreground and background) and uncertain regions (e.g., object boundaries and misclassified categories) for separate processing.
Building upon this, we introduce a complementary self-enhancement method that constrains the semantic consistency between these confident regions and an augmented image with the same class labels.
arXiv Detail & Related papers (2023-12-14T13:21:52Z) - Calibrating Undisciplined Over-Smoothing in Transformer for Weakly Supervised Semantic Segmentation [51.14107156747967]
Weakly supervised semantic segmentation (WSSS) has attracted considerable attention because it requires fewer annotations than fully supervised approaches.<n>We propose an Adaptive Re-Activation Mechanism (AReAM) to control deep-level attention to undisciplined over-smoothing.<n>AReAM substantially improves segmentation performance compared with existing WSSS methods, reducing noise while sharpening focus on relevant semantic regions.
arXiv Detail & Related papers (2023-05-04T19:11:33Z) - Boundary Corrected Multi-scale Fusion Network for Real-time Semantic
Segmentation [15.879949436633021]
Existing semantic segmentation methods rely on the high-resolution input to achieve high accuracy and do not meet the requirements of inference time.
We propose a new method named Boundary Corrected Multi-scale Fusion Network, which uses the designed Low-resolution Multi-scale Fusion Module to extract semantic information.
Our method achieves a state-of-the-art balance of accuracy and speed for the real-time semantic segmentation.
arXiv Detail & Related papers (2022-03-01T13:31:01Z) - Learning to Estimate Hidden Motions with Global Motion Aggregation [71.12650817490318]
Occlusions pose a significant challenge to optical flow algorithms that rely on local evidences.
We introduce a global motion aggregation module to find long-range dependencies between pixels in the first image.
We demonstrate that the optical flow estimates in the occluded regions can be significantly improved without damaging the performance in non-occluded regions.
arXiv Detail & Related papers (2021-04-06T10:32:03Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.