Recurrent Multi-scale Transformer for High-Resolution Salient Object
Detection
- URL: http://arxiv.org/abs/2308.03826v2
- Date: Mon, 4 Sep 2023 06:03:58 GMT
- Title: Recurrent Multi-scale Transformer for High-Resolution Salient Object
Detection
- Authors: Xinhao Deng and Pingping Zhang and Wei Liu and Huchuan Lu
- Abstract summary: Salient Object Detection (SOD) aims to identify and segment the most conspicuous objects in an image or video.
Traditional SOD methods are largely limited to low-resolution images, making them difficult to adapt to the development of High-Resolution SOD.
In this work, we first propose a new HRS10K dataset, which contains 10,500 high-quality annotated images at 2K-8K resolution.
- Score: 68.65338791283298
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Salient Object Detection (SOD) aims to identify and segment the most
conspicuous objects in an image or video. As an important pre-processing step,
it has many potential applications in multimedia and vision tasks. With the
advance of imaging devices, SOD with high-resolution images is of great demand,
recently. However, traditional SOD methods are largely limited to
low-resolution images, making them difficult to adapt to the development of
High-Resolution SOD (HRSOD). Although some HRSOD methods emerge, there are no
large enough datasets for training and evaluating. Besides, current HRSOD
methods generally produce incomplete object regions and irregular object
boundaries. To address above issues, in this work, we first propose a new
HRS10K dataset, which contains 10,500 high-quality annotated images at 2K-8K
resolution. As far as we know, it is the largest dataset for the HRSOD task,
which will significantly help future works in training and evaluating models.
Furthermore, to improve the HRSOD performance, we propose a novel Recurrent
Multi-scale Transformer (RMFormer), which recurrently utilizes shared
Transformers and multi-scale refinement architectures. Thus, high-resolution
saliency maps can be generated with the guidance of lower-resolution
predictions. Extensive experiments on both high-resolution and low-resolution
benchmarks show the effectiveness and superiority of the proposed framework.
The source code and dataset are released at:
https://github.com/DrowsyMon/RMFormer.
Related papers
- Large-Scale Data-Free Knowledge Distillation for ImageNet via Multi-Resolution Data Generation [53.95204595640208]
Data-Free Knowledge Distillation (DFKD) is an advanced technique that enables knowledge transfer from a teacher model to a student model without relying on original training data.
Previous approaches have generated synthetic images at high resolutions without leveraging information from real images.
MUSE generates images at lower resolutions while using Class Activation Maps (CAMs) to ensure that the generated images retain critical, class-specific features.
arXiv Detail & Related papers (2024-11-26T02:23:31Z) - PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network [24.54269823691119]
We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives.
To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD.
All the images are finely annotated in pixel-level, far exceeding previous low-resolution SOD datasets.
arXiv Detail & Related papers (2024-08-02T09:31:21Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Pyramid Grafting Network for One-Stage High Resolution Saliency
Detection [29.013012579688347]
We propose a one-stage framework called Pyramid Grafting Network (PGNet) to extract features from different resolution images independently.
An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically.
We contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD, containing 5,920 images at 4K-8K resolutions.
arXiv Detail & Related papers (2022-04-11T12:22:21Z) - Disentangled High Quality Salient Object Detection [8.416690566816305]
We propose a novel deep learning framework for high-resolution salient object detection (SOD)
It disentangles the task into a low-resolution saliency classification network (LRSCN) and a high-resolution refinement network (HRRN)
arXiv Detail & Related papers (2021-08-08T02:14:15Z) - Best-Buddy GANs for Highly Detailed Image Super-Resolution [71.13466303340192]
We consider the single image super-resolution (SISR) problem, where a high-resolution (HR) image is generated based on a low-resolution (LR) input.
Most methods along this line rely on a predefined single-LR-single-HR mapping, which is not flexible enough for the SISR task.
We propose best-buddy GANs (Beby-GAN) for rich-detail SISR. Relaxing the immutable one-to-one constraint, we allow the estimated patches to dynamically seek the best supervision.
arXiv Detail & Related papers (2021-03-29T02:58:27Z) - A new public Alsat-2B dataset for single-image super-resolution [1.284647943889634]
The paper introduces a novel public remote sensing dataset (Alsat2B) of low and high spatial resolution images (10m and 2.5m respectively) for the single-image super-resolution task.
The high-resolution images are obtained through pan-sharpening.
The obtained results reveal that the proposed scheme is promising and highlight the challenges in the dataset.
arXiv Detail & Related papers (2021-03-21T10:47:38Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z) - PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of
Generative Models [77.32079593577821]
PULSE (Photo Upsampling via Latent Space Exploration) generates high-resolution, realistic images at resolutions previously unseen in the literature.
Our method outperforms state-of-the-art methods in perceptual quality at higher resolutions and scale factors than previously possible.
arXiv Detail & Related papers (2020-03-08T16:44:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.