CarNet: A Lightweight and Efficient Encoder-Decoder Architecture for
High-quality Road Crack Detection
- URL: http://arxiv.org/abs/2109.05707v1
- Date: Mon, 13 Sep 2021 05:01:34 GMT
- Title: CarNet: A Lightweight and Efficient Encoder-Decoder Architecture for
High-quality Road Crack Detection
- Authors: Kai Li, Yingjie Tian, and Zhiquan Qi
- Abstract summary: We present a lightweight encoder-decoder architecture, CarNet, for efficient and high-quality crack detection.
In particular, we propose that the ideal encoder should present an olive-type distribution about the number of convolutional layers at different stages.
In the decoder, we introduce a lightweight up-sampling feature pyramid module to learn rich hierarchical features for crack detection.
- Score: 21.468229247797627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pixel-wise crack detection is a challenging task because of poor continuity
and low contrast in cracks. The existing frameworks usually employ complex
models leading to good accuracy and yet low inference efficiency. In this
paper, we present a lightweight encoder-decoder architecture, CarNet, for
efficient and high-quality crack detection. To this end, we first propose that
the ideal encoder should present an olive-type distribution about the number of
convolutional layers at different stages. Specifically, as the network stages
deepen in the encoder, the number of convolutional layers shows a downward
trend after the model input is compressed in the initial network stage.
Meanwhile, in the decoder, we introduce a lightweight up-sampling feature
pyramid module to learn rich hierarchical features for crack detection. In
particular, we compress the feature maps of the last three network stages to
the same channels and then employ up-sampling with different multiples to
resize them to the same resolutions for information fusion. Finally, extensive
experiments on four public databases, i.e., Sun520, Rain365, BJN260, and
Crack360, demonstrate that our CarNet gains a good trade-off between inference
efficiency and test accuracy over the existing state-of-the-art methods.
Related papers
- Extreme Encoder Output Frame Rate Reduction: Improving Computational
Latencies of Large End-to-End Models [59.57732929473519]
We apply multiple frame reduction layers in the encoder to compress encoder outputs into a small number of output frames.
We demonstrate that we can generate one encoder output frame for every 2.56 sec of input speech, without significantly affecting word error rate on a large-scale voice search task.
arXiv Detail & Related papers (2024-02-27T03:40:44Z) - FFCA-Net: Stereo Image Compression via Fast Cascade Alignment of Side
Information [44.88123177525665]
Multi-view compression technology, especially Stereo Image Compression (SIC), plays a crucial role in car-mounted cameras and 3D-related applications.
We propose a Feature-based Fast Cascade Alignment network (FFCA-Net) to fully leverage the side information on the decoder.
Our approach achieves significant gains in terms of 3 to 10-fold faster decoding speed than other methods.
arXiv Detail & Related papers (2023-12-28T11:12:03Z) - NASH: A Simple Unified Framework of Structured Pruning for Accelerating
Encoder-Decoder Language Models [29.468888611690346]
We propose a simple and effective framework, NASH, that narrows the encoder and shortens the decoder networks of encoder-decoder models.
Our findings highlight two insights: (1) the number of decoder layers is the dominant factor of inference speed, and (2) low sparsity in the pruned encoder network enhances generation quality.
arXiv Detail & Related papers (2023-10-16T04:27:36Z) - Global Context Aggregation Network for Lightweight Saliency Detection of
Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure.
First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module.
The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z) - PlutoNet: An Efficient Polyp Segmentation Network with Modified Partial
Decoder and Decoder Consistency Training [0.40611352512781856]
We propose PlutoNet for polyp segmentation which requires only 2,626,537 parameters, less than 10% of the parameters required by its counterparts.
We train the modified partial decoder and the auxiliary decoder with a combined loss to enforce consistency, which helps improve the encoders representations.
We perform ablation studies and extensive experiments which show that PlutoNet performs significantly better than the state-of-the-art models.
arXiv Detail & Related papers (2022-04-06T20:29:00Z) - Small Lesion Segmentation in Brain MRIs with Subpixel Embedding [105.1223735549524]
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues.
We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network.
arXiv Detail & Related papers (2021-09-18T00:21:17Z) - Crosslink-Net: Double-branch Encoder Segmentation Network via Fusing
Vertical and Horizontal Convolutions [58.71117402626524]
We present a novel double-branch encoder architecture for medical image segmentation.
Our architecture is inspired by two observations: 1) Since the discrimination of features learned via square convolutional kernels needs to be further improved, we propose to utilize non-square vertical and horizontal convolutional kernels.
The experiments validate the effectiveness of our model on four datasets.
arXiv Detail & Related papers (2021-07-24T02:58:32Z) - Suppress and Balance: A Simple Gated Network for Salient Object
Detection [89.88222217065858]
We propose a simple gated network (GateNet) to solve both issues at once.
With the help of multilevel gate units, the valuable context information from the encoder can be optimally transmitted to the decoder.
In addition, we adopt the atrous spatial pyramid pooling based on the proposed "Fold" operation (Fold-ASPP) to accurately localize salient objects of various scales.
arXiv Detail & Related papers (2020-07-16T02:00:53Z) - Automatic Crack Detection on Road Pavements Using Encoder Decoder
Architecture [9.34360241512198]
The proposed algorithm considers an encoder-decoder architecture with hierarchical feature learning and dilated convolution, named U-Hierarchical Dilated Network (U-HDN)
Crack characteristics with multiple context information are automatically able to learn and perform end-to-end crack detection.
arXiv Detail & Related papers (2020-07-01T13:32:23Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.