HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation
- URL: http://arxiv.org/abs/2507.12883v1
- Date: Thu, 17 Jul 2025 08:09:31 GMT
- Title: HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation
- Authors: Weihuang Lin, Yiwei Ma, Xiaoshuai Sun, Shuting He, Jiayi Ji, Liujuan Cao, Rongrong Ji,
- Abstract summary: HRSeg is an efficient model with high-resolution fine-grained perception.<n>It features two key innovations: High-Resolution Perception (HRP) and High-Resolution Enhancement (HRE)
- Score: 74.1872891313184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The reasoning segmentation task involves segmenting objects within an image by interpreting implicit user instructions, which may encompass subtleties such as contextual cues and open-world knowledge. Despite significant advancements made by existing approaches, they remain constrained by low perceptual resolution, as visual encoders are typically pre-trained at lower resolutions. Furthermore, simply interpolating the positional embeddings of visual encoders to enhance perceptual resolution yields only marginal performance improvements while incurring substantial computational costs. To address this, we propose HRSeg, an efficient model with high-resolution fine-grained perception. It features two key innovations: High-Resolution Perception (HRP) and High-Resolution Enhancement (HRE). The HRP module processes high-resolution images through cropping, integrating local and global features for multi-granularity quality. The HRE module enhances mask features by integrating fine-grained information from high-resolution images, refining their alignment with text features for precise segmentation. Extensive ablation studies validate the effectiveness of our modules, while comprehensive experiments on multiple benchmark datasets demonstrate HRSeg's superior performance.
Related papers
- JAFAR: Jack up Any Feature at Any Resolution [53.343826346140624]
JAFAR is a lightweight and flexible feature upsampler for Foundation Visions.<n>It enhances the spatial resolution of visual features from any Foundation Vision to an arbitrary target resolution.<n>It generalizes remarkably well to significantly higher output scales.
arXiv Detail & Related papers (2025-06-10T20:53:12Z) - Semantic-Guided Global-Local Collaborative Networks for Lightweight Image Super-Resolution [9.666827340439669]
Single-Image Super-Resolution (SISR) plays a pivotal role in enhancing the accuracy and reliability of measurement systems.<n>We propose a Semantic-Guided Global-Local Collaborative Network (SGGLC-Net) for lightweight SISR.
arXiv Detail & Related papers (2025-03-20T11:43:55Z) - HRDecoder: High-Resolution Decoder Network for Fundus Image Lesion Segmentation [12.606794661369959]
We propose HRDecoder, a simple High-Resolution Decoder network for fundus lesion segmentation.<n>It integrates a high-resolution representation learning module to capture fine-grained local features and a high-resolution fusion module to fuse multi-scale predictions.<n>Our method effectively improves the overall segmentation accuracy of fundus lesions while consuming reasonable memory and computational overhead, and maintaining satisfying inference speed.
arXiv Detail & Related papers (2024-11-06T15:13:31Z) - CoSeR: Bridging Image and Language for Cognitive Super-Resolution [74.24752388179992]
We introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with the capacity to comprehend low-resolution images.
We achieve this by marrying image appearance and language understanding to generate a cognitive embedding.
To further improve image fidelity, we propose a novel condition injection scheme called "All-in-Attention"
arXiv Detail & Related papers (2023-11-27T16:33:29Z) - Low-Resolution Self-Attention for Semantic Segmentation [93.30597515880079]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.<n>Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.<n>We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - Hi-ResNet: Edge Detail Enhancement for High-Resolution Remote Sensing Segmentation [10.919956120261539]
High-resolution remote sensing (HRS) semantic segmentation extracts key objects from high-resolution coverage areas.
objects of the same category within HRS images show significant differences in scale and shape across diverse geographical environments.
We propose a High-resolution remote sensing network (Hi-ResNet) with efficient network structure designs.
arXiv Detail & Related papers (2023-05-22T03:58:25Z) - Wider and Higher: Intensive Integration and Global Foreground Perception
for Image Matting [44.51635913732913]
This paper reviews recent deep-learning-based matting research and conceives our wider and higher motivation for image matting.
Image matting is essentially a pixel-wise regression, and the ideal situation is to perceive the maximum opacity from the input image.
We propose an Intensive Integration and Global Foreground Perception network (I2GFP) to integrate wider and higher feature streams.
arXiv Detail & Related papers (2022-10-13T11:34:46Z) - Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image
Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising.
We propose rank-enhanced low-dimensional convolution set (Re-ConvSet)
We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z) - Best-Buddy GANs for Highly Detailed Image Super-Resolution [71.13466303340192]
We consider the single image super-resolution (SISR) problem, where a high-resolution (HR) image is generated based on a low-resolution (LR) input.
Most methods along this line rely on a predefined single-LR-single-HR mapping, which is not flexible enough for the SISR task.
We propose best-buddy GANs (Beby-GAN) for rich-detail SISR. Relaxing the immutable one-to-one constraint, we allow the estimated patches to dynamically seek the best supervision.
arXiv Detail & Related papers (2021-03-29T02:58:27Z) - Interpretable Detail-Fidelity Attention Network for Single Image
Super-Resolution [89.1947690981471]
We propose a purposeful and interpretable detail-fidelity attention network to progressively process smoothes and details in divide-and-conquer manner.
Particularly, we propose a Hessian filtering for interpretable feature representation which is high-profile for detail inference.
Experiments demonstrate that the proposed methods achieve superior performances over the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-28T08:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.