End-to-end training of a two-stage neural network for defect detection
- URL: http://arxiv.org/abs/2007.07676v1
- Date: Wed, 15 Jul 2020 13:42:26 GMT
- Title: End-to-end training of a two-stage neural network for defect detection
- Authors: Jakob Bo\v{z}i\v{c}, Domen Tabernik and Danijel Sko\v{c}aj
- Abstract summary: gradient-based, two-stage neural network has shown excellent results in surface defect detection.
We introduce end-to-end training of the two-stage network together with several extensions to the training process.
We show state-of-the-art results on three defect detection datasets.
- Score: 4.38301148531795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segmentation-based, two-stage neural network has shown excellent results in
the surface defect detection, enabling the network to learn from a relatively
small number of samples. In this work, we introduce end-to-end training of the
two-stage network together with several extensions to the training process,
which reduce the amount of training time and improve the results on the surface
defect detection tasks. To enable end-to-end training we carefully balance the
contributions of both the segmentation and the classification loss throughout
the learning. We adjust the gradient flow from the classification into the
segmentation network in order to prevent the unstable features from corrupting
the learning. As an additional extension to the learning, we propose
frequency-of-use sampling scheme of negative samples to address the issue of
over- and under-sampling of images during the training, while we employ the
distance transform algorithm on the region-based segmentation masks as weights
for positive pixels, giving greater importance to areas with higher probability
of presence of defect without requiring a detailed annotation. We demonstrate
the performance of the end-to-end training scheme and the proposed extensions
on three defect detection datasets - DAGM, KolektorSDD and Severstal Steel
defect dataset - where we show state-of-the-art results. On the DAGM and the
KolektorSDD we demonstrate 100\% detection rate, therefore completely solving
the datasets. Additional ablation study performed on all three datasets
quantitatively demonstrates the contribution to the overall result improvements
for each of the proposed extensions.
Related papers
- A Scalable and Generalized Deep Learning Framework for Anomaly Detection in Surveillance Videos [0.47279903800557493]
Anomaly detection in videos is challenging due to the complexity, noise, and diverse nature of activities such as violence, shoplifting, and vandalism.
Existing approaches have struggled to apply deep learning models across different anomaly tasks without extensive retraining.
A new DL framework is introduced in this study, consisting of three key components: transfer learning to enhance feature generalization, model fusion to improve feature representation, and multi-task classification.
Empirical evaluations demonstrate the framework's effectiveness, achieving an accuracy of 97.99% on the RLVS dataset (violence detection), 83.59% on the UCF dataset (shoplifting detection
arXiv Detail & Related papers (2024-07-17T22:41:12Z) - Revisiting Generative Adversarial Networks for Binary Semantic
Segmentation on Imbalanced Datasets [20.538287907723713]
Anomalous crack region detection is a typical binary semantic segmentation task, which aims to detect pixels representing cracks on pavement surface images automatically by algorithms.
Existing deep learning-based methods have achieved outcoming results on specific public pavement datasets, but the performance would deteriorate dramatically on imbalanced datasets.
We propose a deep learning framework based on conditional Generative Adversarial Networks (cGANs) for the anomalous crack region detection tasks at the pixel level.
arXiv Detail & Related papers (2024-02-03T19:24:40Z) - 2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic
Segmentation [92.17700318483745]
We propose an image-guidance network (IGNet) which builds upon the idea of distilling high level feature information from a domain adapted synthetically trained 2D semantic segmentation network.
IGNet achieves state-of-the-art results for weakly-supervised LiDAR semantic segmentation on ScribbleKITTI, boasting up to 98% relative performance to fully supervised training with only 8% labeled points.
arXiv Detail & Related papers (2023-11-27T07:57:29Z) - Learning Compact Features via In-Training Representation Alignment [19.273120635948363]
In each epoch, the true gradient of the loss function is estimated using a mini-batch sampled from the training set.
We propose In-Training Representation Alignment (ITRA) that explicitly aligns feature distributions of two different mini-batches with a matching loss.
We also provide a rigorous analysis of the desirable effects of the matching loss on feature representation learning.
arXiv Detail & Related papers (2022-11-23T22:23:22Z) - Test-time Adaptation with Slot-Centric Models [63.981055778098444]
Slot-TTA is a semi-supervised scene decomposition model that at test time is adapted per scene through gradient descent on reconstruction or cross-view synthesis objectives.
We show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors, and alternative test-time adaptation methods.
arXiv Detail & Related papers (2022-03-21T17:59:50Z) - Guided Point Contrastive Learning for Semi-supervised Point Cloud
Semantic Segmentation [90.2445084743881]
We present a method for semi-supervised point cloud semantic segmentation to adopt unlabeled point clouds in training to boost the model performance.
Inspired by the recent contrastive loss in self-supervised tasks, we propose the guided point contrastive loss to enhance the feature representation and model generalization ability.
arXiv Detail & Related papers (2021-10-15T16:38:54Z) - Point Discriminative Learning for Unsupervised Representation Learning
on 3D Point Clouds [54.31515001741987]
We propose a point discriminative learning method for unsupervised representation learning on 3D point clouds.
We achieve this by imposing a novel point discrimination loss on the middle level and global level point features.
Our method learns powerful representations and achieves new state-of-the-art performance.
arXiv Detail & Related papers (2021-08-04T15:11:48Z) - Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training.
Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system.
The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z) - InverseForm: A Loss Function for Structured Boundary-Aware Segmentation [80.39674800972182]
We present a novel boundary-aware loss term for semantic segmentation using an inverse-transformation network.
This plug-in loss term complements the cross-entropy loss in capturing boundary transformations.
We analyze the quantitative and qualitative effects of our loss function on three indoor and outdoor segmentation benchmarks.
arXiv Detail & Related papers (2021-04-06T18:52:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.