Sparse Refinement for Efficient High-Resolution Semantic Segmentation
- URL: http://arxiv.org/abs/2407.19014v1
- Date: Fri, 26 Jul 2024 18:00:09 GMT
- Title: Sparse Refinement for Efficient High-Resolution Semantic Segmentation
- Authors: Zhijian Liu, Zhuoyang Zhang, Samir Khaki, Shang Yang, Haotian Tang, Chenfeng Xu, Kurt Keutzer, Song Han,
- Abstract summary: SparseRefine enhances dense low-resolution predictions with sparse high-resolution refinements.
It can be seamlessly integrated into any existing semantic segmentation model.
It achieves significant speedup: 1.5 to 3.7 times when applied to HRNet-W48, SegFormer-B5, Mask2Former-T/L and SegNeXt-L on Cityscapes.
- Score: 40.243181997916615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic segmentation empowers numerous real-world applications, such as autonomous driving and augmented/mixed reality. These applications often operate on high-resolution images (e.g., 8 megapixels) to capture the fine details. However, this comes at the cost of considerable computational complexity, hindering the deployment in latency-sensitive scenarios. In this paper, we introduce SparseRefine, a novel approach that enhances dense low-resolution predictions with sparse high-resolution refinements. Based on coarse low-resolution outputs, SparseRefine first uses an entropy selector to identify a sparse set of pixels with high entropy. It then employs a sparse feature extractor to efficiently generate the refinements for those pixels of interest. Finally, it leverages a gated ensembler to apply these sparse refinements to the initial coarse predictions. SparseRefine can be seamlessly integrated into any existing semantic segmentation model, regardless of CNN- or ViT-based. SparseRefine achieves significant speedup: 1.5 to 3.7 times when applied to HRNet-W48, SegFormer-B5, Mask2Former-T/L and SegNeXt-L on Cityscapes, with negligible to no loss of accuracy. Our "dense+sparse" paradigm paves the way for efficient high-resolution visual computing.
Related papers
- High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution [87.56382172827526]
High-frequency regions are most critical for reconstruction.<n>We propose a training-free adaptive masking module for acceleration.<n>Our method reduces FLOPs by 24--43% for state-of-the-art models.
arXiv Detail & Related papers (2025-05-11T13:18:03Z) - Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling [50.34513854725803]
Arbitrary-scale super-resolution (ASSR) aims to reconstruct high-resolution (HR) images from low-resolution (LR) inputs with arbitrary upsampling factors.
We propose a novel ContinuousSR framework with a Pixel-to-Gaussian paradigm, which explicitly reconstructs 2D continuous HR signals from LR images using Gaussian Splatting.
arXiv Detail & Related papers (2025-03-09T13:43:57Z) - SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer [62.11796778482088]
We present a novel model-agnostic sparse vision transformer, dubbed SparseFormer, to bridge the gap of object detection between close-up and HRW shots.
The proposed SparseFormer selectively uses attentive tokens to scrutinize the sparsely distributed windows that may contain objects.
experiments on two HRW benchmarks, PANDA and DOTA-v1.0, demonstrate that the proposed SparseFormer significantly improves detection accuracy (up to 5.8%) and speed (up to 3x) over the state-of-the-art approaches.
arXiv Detail & Related papers (2025-02-11T03:21:25Z) - LoopSparseGS: Loop Based Sparse-View Friendly Gaussian Splatting [18.682864169561498]
LoopSparseGS is a loop-based 3DGS framework for the sparse novel view synthesis task.
We introduce a novel Sparse-friendly Sampling (SFS) strategy to handle oversized Gaussian ellipsoids leading to large pixel errors.
Experiments on four datasets demonstrate that LoopSparseGS outperforms existing state-of-the-art methods for sparse-input novel view synthesis.
arXiv Detail & Related papers (2024-08-01T03:26:50Z) - InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds [91.77050739918037]
Novel view synthesis (NVS) from a sparse set of images has advanced significantly in 3D computer vision.
It relies on precise initial estimation of camera parameters using Structure-from-Motion (SfM)
In this study, we introduce a novel and efficient framework to enhance robust NVS from sparse-view images.
arXiv Detail & Related papers (2024-03-29T17:29:58Z) - Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.
Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.
We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text
Image Super-Resolution [22.60056946339325]
We propose the Pixel Adapter Module (PAM) based on graph attention to address pixel distortion caused by upsampling.
The PAM effectively captures local structural information by allowing each pixel to interact with its neighbors and update features.
We demonstrate that our proposed method generates high-quality super-resolution images, surpassing existing methods in recognition accuracy.
arXiv Detail & Related papers (2023-09-16T08:12:12Z) - CUF: Continuous Upsampling Filters [25.584630142930123]
In this paper, we consider one of the most important operations in image processing: upsampling.
We propose to parameterize upsampling kernels as neural fields.
This parameterization leads to a compact architecture that obtains a 40-fold reduction in the number of parameters when compared with competing arbitrary-scale super-resolution architectures.
arXiv Detail & Related papers (2022-10-13T12:45:51Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic
Convolution [136.7261709896713]
We propose a data-driven approach that generates the appropriate convolution kernels to apply in response to the nature of the instances.
The proposed method achieves promising results on both ScanetNetV2 and S3DIS.
It also improves inference speed by more than 25% over the current state-of-the-art.
arXiv Detail & Related papers (2020-11-26T14:56:57Z) - Hyperspectral Image Super-resolution via Deep Progressive Zero-centric
Residual Learning [62.52242684874278]
Cross-modality distribution of spatial and spectral information makes the problem challenging.
We propose a novel textitlightweight deep neural network-based framework, namely PZRes-Net.
Our framework learns a high resolution and textitzero-centric residual image, which contains high-frequency spatial details of the scene.
arXiv Detail & Related papers (2020-06-18T06:32:11Z) - FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale
Context Aggregation and Feature Space Super-resolution [14.226301825772174]
We introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP)
It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information.
We achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card.
arXiv Detail & Related papers (2020-03-09T03:53:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.