Combating noisy labels in object detection datasets
- URL: http://arxiv.org/abs/2211.13993v3
- Date: Mon, 11 Dec 2023 12:46:04 GMT
- Title: Combating noisy labels in object detection datasets
- Authors: Krystian Chachu{\l}a, Jakub {\L}yskawa, Bart{\l}omiej Olber, Piotr
Fr\k{a}tczak, Adam Popowicz, Krystian Radlak
- Abstract summary: We introduce the Confident Learning for Object Detection (CLOD) algorithm for assessing the quality of each label in object detection datasets.
We identify missing, spurious, mislabeled, and mislocated bounding boxes and suggesting corrections.
The proposed method is able to point out nearly 80% of artificially disturbed bounding boxes with a false positive rate below 0.1.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The quality of training datasets for deep neural networks is a key factor
contributing to the accuracy of resulting models. This effect is amplified in
difficult tasks such as object detection. Dealing with errors in datasets is
often limited to accepting that some fraction of examples are incorrect,
estimating their confidence, and either assigning appropriate weights or
ignoring uncertain ones during training. In this work, we propose a different
approach. We introduce the Confident Learning for Object Detection (CLOD)
algorithm for assessing the quality of each label in object detection datasets,
identifying missing, spurious, mislabeled, and mislocated bounding boxes and
suggesting corrections. By focusing on finding incorrect examples in the
training datasets, we can eliminate them at the root. Suspicious bounding boxes
can be reviewed to improve the quality of the dataset, leading to better models
without further complicating their already complex architectures. The proposed
method is able to point out nearly 80% of artificially disturbed bounding boxes
with a false positive rate below 0.1. Cleaning the datasets by applying the
most confident automatic suggestions improved mAP scores by 16% to 46%,
depending on the dataset, without any modifications to the network
architectures. This approach shows promising potential in rectifying
state-of-the-art object detection datasets.
Related papers
- Revising the Problem of Partial Labels from the Perspective of CNNs' Robustness [6.46250754192468]
We introduce a lightweight partial-label solution using pseudo-labeling techniques and a designed loss function.
We employ D-Score to analyze both the proposed and existing methods to determine whether they can enhance robustness while improving accuracy.
arXiv Detail & Related papers (2024-07-24T20:39:17Z) - SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - Building Manufacturing Deep Learning Models with Minimal and Imbalanced
Training Data Using Domain Adaptation and Data Augmentation [15.333573151694576]
We propose a novel domain adaptation (DA) approach to address the problem of labeled training data scarcity for a target learning task.
Our approach works for scenarios where the source dataset and the dataset available for the target learning task have same or different feature spaces.
We evaluate our combined approach using image data for wafer defect prediction.
arXiv Detail & Related papers (2023-05-31T21:45:34Z) - Knowledge Combination to Learn Rotated Detection Without Rotated
Annotation [53.439096583978504]
Rotated bounding boxes drastically reduce output ambiguity of elongated objects.
Despite the effectiveness, rotated detectors are not widely employed.
We propose a framework that allows the model to predict precise rotated boxes.
arXiv Detail & Related papers (2023-04-05T03:07:36Z) - TransferD2: Automated Defect Detection Approach in Smart Manufacturing
using Transfer Learning Techniques [1.8899300124593645]
We propose a transfer learning approach, namely TransferD2, to correctly identify defects on a dataset of source objects.
Our proposed approach can be applied in defect detection applications where insufficient data is available for training a model and can be extended to identify imperfections in new unseen data.
arXiv Detail & Related papers (2023-02-26T13:24:46Z) - Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D
Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on.
We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z) - Assessing the Quality of the Datasets by Identifying Mislabeled Samples [14.881597737762316]
We propose a novel statistic -- noise score -- as a measure for the quality of each data point to identify mislabeled samples.
In our work, we use the representations derived by the inference network of data quality supervised variational autoencoder (AQUAVS)
We validate our proposed statistic through experimentation by corrupting MNIST, FashionMNIST, and CIFAR10/100 datasets.
arXiv Detail & Related papers (2021-09-10T17:14:09Z) - Towards Reducing Labeling Cost in Deep Object Detection [61.010693873330446]
We propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector.
Our method is able to pseudo-label the very confident predictions, suppressing a potential distribution drift.
arXiv Detail & Related papers (2021-06-22T16:53:09Z) - Semi-supervised Long-tailed Recognition using Alternate Sampling [95.93760490301395]
Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes.
We propose a new recognition setting, namely semi-supervised long-tailed recognition.
We demonstrate significant accuracy improvements over other competitive methods on two datasets.
arXiv Detail & Related papers (2021-05-01T00:43:38Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z) - On the Role of Dataset Quality and Heterogeneity in Model Confidence [27.657631193015252]
Safety-critical applications require machine learning models that output accurate and calibrated probabilities.
Uncalibrated deep networks are known to make over-confident predictions.
We study the impact of dataset quality by studying the impact of dataset size and the label noise on the model confidence.
arXiv Detail & Related papers (2020-02-23T05:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.