Pyramid Grafting Network for One-Stage High Resolution Saliency
Detection
- URL: http://arxiv.org/abs/2204.05041v2
- Date: Tue, 12 Apr 2022 08:08:00 GMT
- Title: Pyramid Grafting Network for One-Stage High Resolution Saliency
Detection
- Authors: Chenxi Xie, Changqun Xia, Mingcan Ma, Zhirui Zhao, Xiaowu Chen and Jia
Li
- Abstract summary: We propose a one-stage framework called Pyramid Grafting Network (PGNet) to extract features from different resolution images independently.
An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically.
We contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD, containing 5,920 images at 4K-8K resolutions.
- Score: 29.013012579688347
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent salient object detection (SOD) methods based on deep neural network
have achieved remarkable performance. However, most of existing SOD models
designed for low-resolution input perform poorly on high-resolution images due
to the contradiction between the sampling depth and the receptive field size.
Aiming at resolving this contradiction, we propose a novel one-stage framework
called Pyramid Grafting Network (PGNet), using transformer and CNN backbone to
extract features from different resolution images independently and then graft
the features from transformer branch to CNN branch. An attention-based
Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine
broken detailed information more holistically, guided by different source
feature during decoding process. Moreover, we design an Attention Guided Loss
(AGL) to explicitly supervise the attention matrix generated by CMGM to help
the network better interact with the attention from different models. We
contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD,
containing 5,920 images at 4K-8K resolutions. To our knowledge, it is the
largest dataset in both quantity and resolution for high-resolution SOD task,
which can be used for training and testing in future research. Sufficient
experiments on UHRSD and widely-used SOD datasets demonstrate that our method
achieves superior performance compared to the state-of-the-art methods.
Related papers
- PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network [24.54269823691119]
We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives.
To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD.
All the images are finely annotated in pixel-level, far exceeding previous low-resolution SOD datasets.
arXiv Detail & Related papers (2024-08-02T09:31:21Z) - DRCT: Saving Image Super-resolution away from Information Bottleneck [7.765333471208582]
Vision Transformer-based approaches for low-level vision tasks have achieved widespread success.
Dense-residual-connected Transformer (DRCT) is proposed to mitigate the loss of spatial information.
Our approach surpasses state-of-the-art methods on benchmark datasets.
arXiv Detail & Related papers (2024-03-31T15:34:45Z) - Recurrent Multi-scale Transformer for High-Resolution Salient Object
Detection [68.65338791283298]
Salient Object Detection (SOD) aims to identify and segment the most conspicuous objects in an image or video.
Traditional SOD methods are largely limited to low-resolution images, making them difficult to adapt to the development of High-Resolution SOD.
In this work, we first propose a new HRS10K dataset, which contains 10,500 high-quality annotated images at 2K-8K resolution.
arXiv Detail & Related papers (2023-08-07T17:49:04Z) - CoT-MISR:Marrying Convolution and Transformer for Multi-Image
Super-Resolution [3.105999623265897]
How to transform a low-resolution image to restore its high-resolution image information is a problem that researchers have been exploring.
CoT-MISR network makes up for local and global information by using the advantages of convolution and tr.
arXiv Detail & Related papers (2023-03-12T03:01:29Z) - Model Inspired Autoencoder for Unsupervised Hyperspectral Image
Super-Resolution [25.878793557013207]
This paper focuses on hyperspectral image (HSI) super-resolution that aims to fuse a low-spatial-resolution HSI and a high-spatial-resolution multispectral image.
Existing deep learning-based approaches are mostly supervised that rely on a large number of labeled training samples.
We make the first attempt to design a model inspired deep network for HSI super-resolution in an unsupervised manner.
arXiv Detail & Related papers (2021-10-22T05:15:16Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - Learning Selective Mutual Attention and Contrast for RGB-D Saliency
Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection.
Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods.
We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z) - Locality-Aware Rotated Ship Detection in High-Resolution Remote Sensing
Imagery Based on Multi-Scale Convolutional Network [7.984128966509492]
We propose a locality-aware rotated ship detection (LARSD) framework based on a multi-scale convolutional neural network (CNN)
The proposed framework applies a UNet-like multi-scale CNN to generate multi-scale feature maps with high-level information in high resolution.
To enlarge the detection dataset, we build a new high-resolution ship detection (HRSD) dataset, where 2499 images and 9269 instances were collected from Google Earth with different resolutions.
arXiv Detail & Related papers (2020-07-24T03:01:42Z) - Contextual-Bandit Anomaly Detection for IoT Data in Distributed
Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay.
We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems.
We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.