AIBench Scenario: Scenario-distilling AI Benchmarking
- URL: http://arxiv.org/abs/2005.03459v4
- Date: Sun, 5 Sep 2021 13:44:38 GMT
- Title: AIBench Scenario: Scenario-distilling AI Benchmarking
- Authors: Wanling Gao, Fei Tang, Jianfeng Zhan, Xu Wen, Lei Wang, Zheng Cao,
Chuanxin Lan, Chunjie Luo, Xiaoli Liu, Zihan Jiang
- Abstract summary: We formalize a real-world application scenario as a Directed Acyclic Graph-based model.
We propose the rules to distill it into a permutation of essential AI and non-AI tasks, which we call a scenario benchmark.
We implement two Internet service AI scenario benchmarks based on the framework as proxies to two real-world application scenarios.
- Score: 8.909947747424672
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern real-world application scenarios like Internet services consist of a
diversity of AI and non-AI modules with huge code sizes and long and
complicated execution paths, which raises serious benchmarking or evaluating
challenges. Using AI components or micro benchmarks alone can lead to
error-prone conclusions. This paper presents a methodology to attack the above
challenge. We formalize a real-world application scenario as a Directed Acyclic
Graph-based model and propose the rules to distill it into a permutation of
essential AI and non-AI tasks, which we call a scenario benchmark. Together
with seventeen industry partners, we extract nine typical scenario benchmarks.
We design and implement an extensible, configurable, and flexible benchmark
framework. We implement two Internet service AI scenario benchmarks based on
the framework as proxies to two real-world application scenarios. We consider
scenario, component, and micro benchmarks as three indispensable parts for
evaluating. Our evaluation shows the advantage of our methodology against using
component or micro AI benchmarks alone. The specifications, source code,
testbed, and results are publicly available from
\url{https://www.benchcouncil.org/aibench/scenario/}.
Related papers
- UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation [66.05528698010697]
Test-Time Adaptation aims to adapt pre-trained models to the target domain during testing.
Researchers have identified various challenging scenarios and developed diverse methods to address these challenges.
We propose a Unified Test-Time Adaptation benchmark, which is comprehensive and widely applicable.
arXiv Detail & Related papers (2024-07-29T15:04:53Z) - Alice Benchmarks: Connecting Real World Re-Identification with the
Synthetic [92.02220105679713]
We introduce the Alice benchmarks, large-scale datasets providing benchmarks and evaluation protocols to the research community.
Within the Alice benchmarks, two object re-ID tasks are offered: person and vehicle re-ID.
As an important feature of our real target, the clusterability of its training set is not manually guaranteed to make it closer to a real domain adaptation test scenario.
arXiv Detail & Related papers (2023-10-06T17:58:26Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - UMSE: Unified Multi-scenario Summarization Evaluation [52.60867881867428]
Summarization quality evaluation is a non-trivial task in text summarization.
We propose Unified Multi-scenario Summarization Evaluation Model (UMSE)
Our UMSE is the first unified summarization evaluation framework engaged with the ability to be used in three evaluation scenarios.
arXiv Detail & Related papers (2023-05-26T12:54:44Z) - Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction [67.54420015049732]
Aspect Sentiment Triplet Extraction (ASTE) is a challenging task in sentiment analysis, aiming to provide fine-grained insights into human sentiments.
Existing benchmarks are limited to two domains and do not evaluate model performance on unseen domains.
We introduce a domain-expanded benchmark by annotating samples from diverse domains, enabling evaluation of models in both in-domain and out-of-domain settings.
arXiv Detail & Related papers (2023-05-23T18:01:49Z) - Mystique: Enabling Accurate and Scalable Generation of Production AI
Benchmarks [2.0315147707806283]
Mystique is an accurate and scalable framework for production AI benchmark generation.
Mystique is scalable, due to its lightweight data collection, in terms of overhead runtime and instrumentation effort.
We evaluate our methodology on several production AI models, and show that benchmarks generated with Mystique closely resemble original AI models.
arXiv Detail & Related papers (2022-12-16T18:46:37Z) - From Multi-agent to Multi-robot: A Scalable Training and Evaluation
Platform for Multi-robot Reinforcement Learning [12.74238738538799]
Multi-agent reinforcement learning (MARL) has been gaining extensive attention from academia and industries in the past few decades.
It remains unknown how these methods perform in real-world scenarios, especially multi-robot systems.
This paper introduces a scalable emulation platform for multi-robot reinforcement learning (MRRL) called SMART to meet this need.
arXiv Detail & Related papers (2022-06-20T06:36:45Z) - Addressing the IEEE AV Test Challenge with Scenic and VerifAI [10.221093591444731]
This paper summarizes our formal approach to testing autonomous vehicles (AVs) in simulation for the IEEE AV Test Challenge.
We demonstrate a systematic testing framework leveraging our previous work on formally-driven simulation for intelligent cyber-physical systems.
arXiv Detail & Related papers (2021-08-20T04:51:27Z) - AIBench Training: Balanced Industry-Standard AI Training Benchmarking [26.820244556465333]
Earlier-stage evaluations of a new AI architecture/system need affordable benchmarks.
We use real-world benchmarks to cover the factors space that impacts the learning dynamics.
We contribute by far the most comprehensive AI training benchmark suite.
arXiv Detail & Related papers (2020-04-30T11:08:49Z) - Learning Meta Face Recognition in Unseen Domains [74.69681594452125]
We propose a novel face recognition method via meta-learning named Meta Face Recognition (MFR)
MFR synthesizes the source/target domain shift with a meta-optimization objective.
We propose two benchmarks for generalized face recognition evaluation.
arXiv Detail & Related papers (2020-03-17T14:10:30Z) - AIBench: An Agile Domain-specific Benchmarking Methodology and an AI
Benchmark Suite [26.820244556465333]
This paper proposes an agile domain-specific benchmarking methodology.
We identify ten important end-to-end application scenarios, among which sixteen representative AI tasks are distilled as the AI component benchmarks.
We present the first end-to-end Internet service AI benchmark.
arXiv Detail & Related papers (2020-02-17T07:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.