A First Look at Dataset Bias in License Plate Recognition
- URL: http://arxiv.org/abs/2208.10657v1
- Date: Tue, 23 Aug 2022 00:20:33 GMT
- Title: A First Look at Dataset Bias in License Plate Recognition
- Authors: Rayson Laroca, Marcelo Santos, Valter Estevam, Eduardo Luz, David
Menotti
- Abstract summary: dataset bias has been recognized as a severe problem in the computer vision community.
This work investigates the dataset bias problem in the License Plate Recognition context.
- Score: 1.8496815029347666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Public datasets have played a key role in advancing the state of the art in
License Plate Recognition (LPR). Although dataset bias has been recognized as a
severe problem in the computer vision community, it has been largely overlooked
in the LPR literature. LPR models are usually trained and evaluated separately
on each dataset. In this scenario, they have often proven robust in the dataset
they were trained in but showed limited performance in unseen ones. Therefore,
this work investigates the dataset bias problem in the LPR context. We
performed experiments on eight datasets, four collected in Brazil and four in
mainland China, and observed that each dataset has a unique, identifiable
"signature" since a lightweight classification model predicts the source
dataset of a license plate (LP) image with more than 95% accuracy. In our
discussion, we draw attention to the fact that most LPR models are probably
exploiting such signatures to improve the results achieved in each dataset at
the cost of losing generalization capability. These results emphasize the
importance of evaluating LPR models in cross-dataset setups, as they provide a
better indication of generalization (hence real-world performance) than
within-dataset ones.
Related papers
- Mapping Bias in Vision Language Models: Signposts, Pitfalls, and the Road Ahead [1.3995965887921709]
We analyze demographic biases across five models and six datasets.
Portrait datasets like UTKFace and CelebA are the best tools for bias detection.
We introduce a more difficult version of VisoGender to serve as a more rigorous evaluation.
arXiv Detail & Related papers (2024-10-17T02:03:27Z) - On the Universal Truthfulness Hyperplane Inside LLMs [27.007142483859162]
We investigate whether a universal truthfulness hyperplane that distinguishes the model's factually correct and incorrect outputs exists within the model.
Our results indicate that increasing the diversity of the training datasets significantly enhances the performance in all scenarios.
arXiv Detail & Related papers (2024-07-11T15:07:26Z) - Do We Train on Test Data? The Impact of Near-Duplicates on License Plate
Recognition [4.6425780769024945]
This work draws attention to the large fraction of near-duplicates in the training and test sets of datasets widely adopted in License Plate Recognition (LPR) research.
Our experiments, conducted on the two most popular datasets in the field, show a substantial decrease in recognition rate when six well-known models are trained and tested under fair splits.
These findings suggest that such duplicates have significantly biased the evaluation and development of deep learning-based models for LPR.
arXiv Detail & Related papers (2023-04-10T15:24:29Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - On the Cross-dataset Generalization in License Plate Recognition [1.8514314381314887]
We propose a traditional-split versus leave-one-dataset-out experimental setup to empirically assess the cross-dataset generalization of 12 OCR models.
Results shed light on the limitations of the traditional-split protocol for evaluating approaches in the ALPR context.
arXiv Detail & Related papers (2022-01-02T00:56:09Z) - Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles [66.15398165275926]
We propose a method that can automatically detect and ignore dataset-specific patterns, which we call dataset biases.
Our method trains a lower capacity model in an ensemble with a higher capacity model.
We show improvement in all settings, including a 10 point gain on the visual question answering dataset.
arXiv Detail & Related papers (2020-11-07T22:20:03Z) - Dataset Cartography: Mapping and Diagnosing Datasets with Training
Dynamics [118.75207687144817]
We introduce Data Maps, a model-based tool to characterize and diagnose datasets.
We leverage a largely ignored source of information: the behavior of the model on individual instances during training.
Our results indicate that a shift in focus from quantity to quality of data could lead to robust models and improved out-of-distribution generalization.
arXiv Detail & Related papers (2020-09-22T20:19:41Z) - On the Composition and Limitations of Publicly Available COVID-19 X-Ray
Imaging Datasets [0.0]
Data scarcity, mismatch between training and target population, group imbalance, and lack of documentation are important sources of bias.
This paper presents an overview of the currently public available COVID-19 chest X-ray datasets.
arXiv Detail & Related papers (2020-08-26T14:16:01Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning [85.33459673197149]
We introduce a new Reading dataset requiring logical reasoning (ReClor) extracted from standardized graduate admission examinations.
In this paper, we propose to identify biased data points and separate them into EASY set and the rest as HARD set.
Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set.
However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models.
arXiv Detail & Related papers (2020-02-11T11:54:29Z) - Adversarial Filters of Dataset Biases [96.090959788952]
Large neural models have demonstrated human-level performance on language and vision benchmarks.
Their performance degrades considerably on adversarial or out-of-distribution samples.
We propose AFLite, which adversarially filters such dataset biases.
arXiv Detail & Related papers (2020-02-10T21:59:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.