EDTER: Edge Detection with Transformer
- URL: http://arxiv.org/abs/2203.08566v1
- Date: Wed, 16 Mar 2022 11:55:55 GMT
- Title: EDTER: Edge Detection with Transformer
- Authors: Mengyang Pu and Yaping Huang and Yuming Liu and Qingji Guan and Haibin
Ling
- Abstract summary: We propose a novel transformer-based edge detector, emphEdge Detection TransformER (EDTER), to extract clear and crisp object boundaries and meaningful edges.
EDTER exploits the full image context information and detailed local cues simultaneously.
Experiments on BSDS500, NYUDv2, and Multicue demonstrate the superiority of EDTER in comparison with state-of-the-arts.
- Score: 71.83960813880843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural networks have made significant progresses in edge
detection by progressively exploring the context and semantic features.
However, local details are gradually suppressed with the enlarging of receptive
fields. Recently, vision transformer has shown excellent capability in
capturing long-range dependencies. Inspired by this, we propose a novel
transformer-based edge detector, \emph{Edge Detection TransformER (EDTER)}, to
extract clear and crisp object boundaries and meaningful edges by exploiting
the full image context information and detailed local cues simultaneously.
EDTER works in two stages. In Stage I, a global transformer encoder is used to
capture long-range global context on coarse-grained image patches. Then in
Stage II, a local transformer encoder works on fine-grained patches to excavate
the short-range local cues. Each transformer encoder is followed by an
elaborately designed Bi-directional Multi-Level Aggregation decoder to achieve
high-resolution features. Finally, the global context and local cues are
combined by a Feature Fusion Module and fed into a decision head for edge
prediction. Extensive experiments on BSDS500, NYUDv2, and Multicue demonstrate
the superiority of EDTER in comparison with state-of-the-arts.
Related papers
- LGFCTR: Local and Global Feature Convolutional Transformer for Image
Matching [8.503217766507584]
A novel convolutional transformer is proposed to capture both local contexts and global structures.
A universal FPN-like framework captures global structures in self-encoder as well as cross-decoder by transformers.
A novel regression-based sub-pixel refinement module exploits the whole fine-grained window features for fine-level positional deviation regression.
arXiv Detail & Related papers (2023-11-29T12:06:19Z) - TransY-Net:Learning Fully Transformer Networks for Change Detection of
Remote Sensing Images [64.63004710817239]
We propose a novel Transformer-based learning framework named TransY-Net for remote sensing image CD.
It improves the feature extraction from a global view and combines multi-level visual features in a pyramid manner.
Our proposed method achieves a new state-of-the-art performance on four optical and two SAR image CD benchmarks.
arXiv Detail & Related papers (2023-10-22T07:42:19Z) - SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient
object detection [12.126413875108993]
We propose a cross-modality fusion model SwinNet for RGB-D and RGB-T salient object detection.
The proposed model outperforms the state-of-the-art models on RGB-D and RGB-T datasets.
arXiv Detail & Related papers (2022-04-12T07:37:39Z) - TransCMD: Cross-Modal Decoder Equipped with Transformer for RGB-D
Salient Object Detection [86.94578023985677]
In this work, we rethink this task from the perspective of global information alignment and transformation.
Specifically, the proposed method (TransCMD) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path.
Experimental results on seven RGB-D SOD benchmark datasets demonstrate that a simple two-stream encoder-decoder framework can surpass the state-of-the-art purely CNN-based methods.
arXiv Detail & Related papers (2021-12-04T15:45:34Z) - Boosting Salient Object Detection with Transformer-based Asymmetric
Bilateral U-Net [19.21709807149165]
Existing salient object detection (SOD) methods mainly rely on U-shaped convolution neural networks (CNNs) with skip connections.
We propose a transformer-based Asymmetric Bilateral U-Net (ABiU-Net) to learn both global and local representations for SOD.
ABiU-Net performs favorably against previous state-of-the-art SOD methods.
arXiv Detail & Related papers (2021-08-17T19:45:28Z) - Unifying Global-Local Representations in Salient Object Detection with Transformer [55.23033277636774]
We introduce a new attention-based encoder, vision transformer, into salient object detection.
With the global view in very shallow layers, the transformer encoder preserves more local representations.
Our method significantly outperforms other FCN-based and transformer-based methods in five benchmarks.
arXiv Detail & Related papers (2021-08-05T17:51:32Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.