A new baseline for edge detection: Make Encoder-Decoder great again
- URL: http://arxiv.org/abs/2409.14976v1
- Date: Mon, 23 Sep 2024 12:54:38 GMT
- Title: A new baseline for edge detection: Make Encoder-Decoder great again
- Authors: Yachuan Li, Xavier Soria Pomab, Yongke Xi, Guanlin Li, Chaozhi Yang, Qian Xiao, Yun Bai, Zongmin LI,
- Abstract summary: The proposed New Baseline for Edge Detection (NBED) achieves superior performance consistently across multiple edge detection benchmarks.
The ODS of NBED on BSDS500 is 0.838, achieving state-of-the-art performance.
- Score: 3.3171122239461193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The performance of deep learning based edge detector has far exceeded that of humans, but the huge computational cost and complex training strategy hinder its further development and application. In this paper, we eliminate these complexities with a vanilla encoder-decoder based detector. Firstly, we design a bilateral encoder to decouple the extraction process of location features and semantic features. Since the location branch no longer provides cues for the semantic branch, the richness of features can be further compressed, which is the key to make our model more compact. We propose a cascaded feature fusion decoder, where the location features are progressively refined by semantic features. The refined location features are the only basis for generating the edge map. The coarse original location features and semantic features are avoided from direct contact with the final result. So the noise in the location features and the location error in the semantic features can be suppressed in the generated edge map. The proposed New Baseline for Edge Detection (NBED) achieves superior performance consistently across multiple edge detection benchmarks, even compared with those methods with huge computational cost and complex training strategy. The ODS of NBED on BSDS500 is 0.838, achieving state-of-the-art performance. Our study shows that what really matters in the current edge detection is high-quality features, and we can make the encoder-decoder based detector great again even without complex training strategies and huge computational cost. The code is available at https://github.com/Li-yachuan/NBED.
Related papers
- Learning to Make Keypoints Sub-Pixel Accurate [80.55676599677824]
This work addresses the challenge of sub-pixel accuracy in detecting 2D local features.
We propose a novel network that enhances any detector with sub-pixel precision by learning an offset vector for detected features.
arXiv Detail & Related papers (2024-07-16T12:39:56Z) - Global Context Aggregation Network for Lightweight Saliency Detection of
Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure.
First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module.
The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive
Sparse Anchor Generation [50.01244854344167]
We bridge the performance gap between sparse and dense detectors by proposing Adaptive Sparse Anchor Generator (ASAG)
ASAG predicts dynamic anchors on patches rather than grids in a sparse way so that it alleviates the feature conflict problem.
Our method outperforms dense-d ones and achieves a better speed-accuracy trade-off.
arXiv Detail & Related papers (2023-08-18T02:06:49Z) - DETR Doesn't Need Multi-Scale or Locality Design [69.56292005230185]
This paper presents an improved DETR detector that maintains a "plain" nature.
It uses a single-scale feature map and global cross-attention calculations without specific locality constraints.
We show that two simple technologies are surprisingly effective within a plain design to compensate for the lack of multi-scale feature maps and locality constraints.
arXiv Detail & Related papers (2023-08-03T17:59:04Z) - DAC: Detector-Agnostic Spatial Covariances for Deep Local Features [11.494662473750505]
Current deep visual local feature detectors do not model the spatial uncertainty of detected features.
We propose two post-hoc covariance estimates that can be plugged into any pretrained deep feature detector.
arXiv Detail & Related papers (2023-05-20T17:43:09Z) - DPC: Unsupervised Deep Point Correspondence via Cross and Self
Construction [29.191330510706408]
We present a new method for real-time non-rigid dense correspondence between point clouds based on structured shape construction.
Our method, termed Deep Point Correspondence (DPC), requires a fraction of the training data compared to previous techniques.
Our construction scheme leads to a performance boost in comparison to recent state-of-the-art correspondence methods.
arXiv Detail & Related papers (2021-10-16T18:41:13Z) - Dense Label Encoding for Boundary Discontinuity Free Rotation Detection [69.75559390700887]
This paper explores a relatively less-studied methodology based on classification.
We propose new techniques to push its frontier in two aspects.
Experiments and visual analysis on large-scale public datasets for aerial images show the effectiveness of our approach.
arXiv Detail & Related papers (2020-11-19T05:42:02Z) - Hybrid-Attention Guided Network with Multiple Resolution Features for
Person Re-Identification [30.285126447140254]
We present a novel person re-ID model that fuses high- and low-level embeddings to reduce the information loss caused in learning high-level features.
We also introduce the spatial and channel attention mechanisms in our model, which aims to mine more discriminative features related to the target.
arXiv Detail & Related papers (2020-09-16T08:12:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.