Mapping global dynamics of benchmark creation and saturation in
artificial intelligence
- URL: http://arxiv.org/abs/2203.04592v1
- Date: Wed, 9 Mar 2022 09:16:49 GMT
- Title: Mapping global dynamics of benchmark creation and saturation in
artificial intelligence
- Authors: Adriano Barbosa-Silva, Simon Ott, Kathrin Blagec, Jan Brauner,
Matthias Samwald
- Abstract summary: We create maps of the global dynamics of benchmark creation and saturation.
We curated data for 1688 benchmarks covering the entire domains of computer vision and natural language processing.
- Score: 5.233652342195164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Benchmarks are crucial to measuring and steering progress in artificial
intelligence (AI). However, recent studies raised concerns over the state of AI
benchmarking, reporting issues such as benchmark overfitting, benchmark
saturation and increasing centralization of benchmark dataset creation. To
facilitate monitoring of the health of the AI benchmarking ecosystem, we
introduce methodologies for creating condensed maps of the global dynamics of
benchmark creation and saturation. We curated data for 1688 benchmarks covering
the entire domains of computer vision and natural language processing, and show
that a large fraction of benchmarks quickly trended towards near-saturation,
that many benchmarks fail to find widespread utilization, and that benchmark
performance gains for different AI tasks were prone to unforeseen bursts. We
conclude that future work should focus on large-scale community collaboration
and on mapping benchmark performance gains to real-world utility and impact of
AI.
Related papers
- BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices [28.70453947993952]
We develop an assessment framework considering 46 best practices across an AI benchmark's lifecycle and evaluate 24 AI benchmarks against it.
We find that there exist large quality differences and that commonly used benchmarks suffer from significant issues.
arXiv Detail & Related papers (2024-11-20T02:38:24Z) - OCDB: Revisiting Causal Discovery with a Comprehensive Benchmark and Evaluation Framework [21.87740178652843]
Causal discovery offers a promising approach to improve transparency and reliability.
We propose a flexible evaluation framework with metrics for evaluating differences in causal structures and causal effects.
We introduce the Open Causal Discovery Benchmark (OCDB), based on real data, to promote fair comparisons and drive optimization of algorithms.
arXiv Detail & Related papers (2024-06-07T03:09:22Z) - A Review of Benchmarks for Visual Defect Detection in the Manufacturing
Industry [63.52264764099532]
We propose a study of existing benchmarks to compare and expose their characteristics and their use-cases.
A study of industrial metrics requirements, as well as testing procedures, will be presented and applied to the studied benchmarks.
arXiv Detail & Related papers (2023-05-05T07:44:23Z) - Benchmarks for Automated Commonsense Reasoning: A Survey [0.0]
More than one hundred benchmarks have been developed to test the commonsense knowledge and commonsense reasoning abilities of AI systems.
This paper surveys the development and uses of AI commonsense benchmarks.
arXiv Detail & Related papers (2023-02-09T16:34:30Z) - IM-IAD: Industrial Image Anomaly Detection Benchmark in Manufacturing [88.35145788575348]
Image anomaly detection (IAD) is an emerging and vital computer vision task in industrial manufacturing.
The lack of a uniform IM benchmark is hindering the development and usage of IAD methods in real-world applications.
We construct a comprehensive image anomaly detection benchmark (IM-IAD), which includes 19 algorithms on seven major datasets.
arXiv Detail & Related papers (2023-01-31T01:24:45Z) - DataPerf: Benchmarks for Data-Centric AI Development [81.03754002516862]
DataPerf is a community-led benchmark suite for evaluating ML datasets and data-centric algorithms.
We provide an open, online platform with multiple rounds of challenges to support this iterative development.
The benchmarks, online evaluation platform, and baseline implementations are open source.
arXiv Detail & Related papers (2022-07-20T17:47:54Z) - Benchmarking Node Outlier Detection on Graphs [90.29966986023403]
Graph outlier detection is an emerging but crucial machine learning task with numerous applications.
We present the first comprehensive unsupervised node outlier detection benchmark for graphs called UNOD.
arXiv Detail & Related papers (2022-06-21T01:46:38Z) - The Benchmark Lottery [114.43978017484893]
"A benchmark lottery" describes the overall fragility of the machine learning benchmarking process.
We show that the relative performance of algorithms may be altered significantly simply by choosing different benchmark tasks.
arXiv Detail & Related papers (2021-07-14T21:08:30Z) - AIBench Training: Balanced Industry-Standard AI Training Benchmarking [26.820244556465333]
Earlier-stage evaluations of a new AI architecture/system need affordable benchmarks.
We use real-world benchmarks to cover the factors space that impacts the learning dynamics.
We contribute by far the most comprehensive AI training benchmark suite.
arXiv Detail & Related papers (2020-04-30T11:08:49Z) - Benchmarking Graph Neural Networks [75.42159546060509]
Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs.
For any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress.
GitHub repository has reached 1,800 stars and 339 forks, which demonstrates the utility of the proposed open-source framework.
arXiv Detail & Related papers (2020-03-02T15:58:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.