A Review of Benchmarks for Visual Defect Detection in the Manufacturing
Industry
- URL: http://arxiv.org/abs/2305.13261v1
- Date: Fri, 5 May 2023 07:44:23 GMT
- Title: A Review of Benchmarks for Visual Defect Detection in the Manufacturing
Industry
- Authors: Philippe Carvalho (Roberval), Alexandre Durupt (Roberval), Yves
Grandvalet (Heudiasyc)
- Abstract summary: We propose a study of existing benchmarks to compare and expose their characteristics and their use-cases.
A study of industrial metrics requirements, as well as testing procedures, will be presented and applied to the studied benchmarks.
- Score: 63.52264764099532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The field of industrial defect detection using machine learning and deep
learning is a subject of active research. Datasets, also called benchmarks, are
used to compare and assess research results. There is a number of datasets in
industrial visual inspection, of varying quality. Thus, it is a difficult task
to determine which dataset to use. Generally speaking, datasets which include a
testing set, with precise labeling and made in real-world conditions should be
preferred. We propose a study of existing benchmarks to compare and expose
their characteristics and their use-cases. A study of industrial metrics
requirements, as well as testing procedures, will be presented and applied to
the studied benchmarks. We discuss our findings by examining the current state
of benchmarks for industrial visual inspection, and by exposing guidelines on
the usage of benchmarks.
Related papers
- How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs [60.25940747590386]
We propose How2Bench, which is comprised of a 55-criteria checklist as a set of guidelines to govern the development of code-related benchmarks comprehensively.
We profiled 274 benchmarks released within the past decade and found concerning issues.
Nearly 70% of the benchmarks did not take measures for data quality assurance; over 10% did not even open source or only partially open source.
arXiv Detail & Related papers (2025-01-18T09:51:57Z) - More than Marketing? On the Information Value of AI Benchmarks for Practitioners [42.73526862595375]
In academia, public benchmarks were generally viewed as suitable measures for capturing research progress.
In product and policy, benchmarks were often found to be inadequate for informing substantive decisions.
We conclude that effective benchmarks should provide meaningful, real-world evaluations, incorporate domain expertise, and maintain transparency in scope and goals.
arXiv Detail & Related papers (2024-12-07T03:35:39Z) - Benchmark Data Repositories for Better Benchmarking [26.15831504718431]
In machine learning research, it is common to evaluate algorithms via their performance on benchmark datasets.
We analyze the landscape of these $textitbenchmark data repositories and the role they can play in improving benchmarking.
arXiv Detail & Related papers (2024-10-31T16:30:08Z) - Do Text-to-Vis Benchmarks Test Real Use of Visualisations? [11.442971909006657]
This paper investigates whether benchmarks reflect real-world use through an empirical study comparing benchmark datasets with code from public repositories.
Our findings reveal a substantial gap, with evaluations not testing the same distribution of chart types, attributes, and actions as real-world examples.
One dataset is representative, but requires extensive modification to become a practical end-to-end benchmark.
This shows that new benchmarks are needed to support the development of systems that truly address users' visualisation needs.
arXiv Detail & Related papers (2024-07-29T06:13:28Z) - ECBD: Evidence-Centered Benchmark Design for NLP [95.50252564938417]
We propose Evidence-Centered Benchmark Design (ECBD), a framework which formalizes the benchmark design process into five modules.
Each module requires benchmark designers to describe, justify, and support benchmark design choices.
Our analysis reveals common trends in benchmark design and documentation that could threaten the validity of benchmarks' measurements.
arXiv Detail & Related papers (2024-06-13T00:59:55Z) - TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs [12.839640915518443]
Benchmarking is the de-facto standard for evaluating LLMs, due to its speed, replicability and low cost.
Recent work has pointed out that the majority of the open source benchmarks available today have been contaminated or leaked into LLMs.
We propose Private Benchmarking, a solution where test datasets are kept private and models are evaluated without revealing the test data to the model.
arXiv Detail & Related papers (2024-03-01T09:28:38Z) - Reliability in Semantic Segmentation: Can We Use Synthetic Data? [69.28268603137546]
We show for the first time how synthetic data can be specifically generated to assess comprehensively the real-world reliability of semantic segmentation models.
This synthetic data is employed to evaluate the robustness of pretrained segmenters.
We demonstrate how our approach can be utilized to enhance the calibration and OOD detection capabilities of segmenters.
arXiv Detail & Related papers (2023-12-14T18:56:07Z) - TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction [131.7684896032888]
We present TextEE, a standardized, fair, and reproducible benchmark for event extraction.
TextEE comprises standardized data preprocessing scripts and splits for 16 datasets spanning eight diverse domains.
We evaluate five varied large language models on our TextEE benchmark and demonstrate how they struggle to achieve satisfactory performance.
arXiv Detail & Related papers (2023-11-16T04:43:03Z) - Don't Make Your LLM an Evaluation Benchmark Cheater [142.24553056600627]
Large language models(LLMs) have greatly advanced the frontiers of artificial intelligence, attaining remarkable improvement in model capacity.
To assess the model performance, a typical approach is to construct evaluation benchmarks for measuring the ability level of LLMs.
We discuss the potential risk and impact of inappropriately using evaluation benchmarks and misleadingly interpreting the evaluation results.
arXiv Detail & Related papers (2023-11-03T14:59:54Z) - AI applications in forest monitoring need remote sensing benchmark
datasets [0.0]
We present requirements and considerations for the creation of rigorous, useful benchmarking datasets for forest monitoring applications.
We list a set of example large-scale datasets that could contribute to benchmarking, and present a vision for how community-driven, representative benchmarking initiatives could benefit the field.
arXiv Detail & Related papers (2022-12-20T01:11:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.