OpenPerf: A Benchmarking Framework for the Sustainable Development of
the Open-Source Ecosystem
- URL: http://arxiv.org/abs/2311.15212v1
- Date: Sun, 26 Nov 2023 07:01:36 GMT
- Title: OpenPerf: A Benchmarking Framework for the Sustainable Development of
the Open-Source Ecosystem
- Authors: Fenglin Bi, Fanyu Han, Shengyu Zhao, Jinlu Li, Yanbin Zhang, Wei Wang
- Abstract summary: OpenPerf is a benchmarking framework designed for the sustainable development of the open-source ecosystem.
We implement 3 data science task benchmarks, 2 index-based benchmarks, and 1 standard benchmark.
We have developed a comprehensive toolkit for OpenPerf, which offers robust data management, tool integration, and user interface capabilities.
- Score: 6.188178422139467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Benchmarking involves designing scientific test methods, tools, and
frameworks to quantitatively and comparably assess specific performance
indicators of certain test subjects. With the development of artificial
intelligence, AI benchmarking datasets such as ImageNet and DataPerf have
gradually become consensus standards in both academic and industrial fields.
However, constructing a benchmarking framework remains a significant challenge
in the open-source domain due to the diverse range of data types, the wide
array of research issues, and the intricate nature of collaboration networks.
This paper introduces OpenPerf, a benchmarking framework designed for the
sustainable development of the open-source ecosystem. This framework defines 9
task benchmarking tasks in the open-source research, encompassing 3 data types:
time series, text, and graphics, and addresses 6 research problems including
regression, classification, recommendation, ranking, network building, and
anomaly detection. Based on the above tasks, we implemented 3 data science task
benchmarks, 2 index-based benchmarks, and 1 standard benchmark. Notably, the
index-based benchmarks have been adopted by the China Electronics
Standardization Institute as evaluation criteria for open-source community
governance. Additionally, we have developed a comprehensive toolkit for
OpenPerf, which not only offers robust data management, tool integration, and
user interface capabilities but also adopts a Benchmarking-as-a-Service (BaaS)
model to serve academic institutions, industries, and foundations. Through its
application in renowned companies and institutions such as Alibaba, Ant Group,
and East China Normal University, we have validated OpenPerf's pivotal role in
the healthy evolution of the open-source ecosystem.
Related papers
- IdeaBench: Benchmarking Large Language Models for Research Idea Generation [19.66218274796796]
Large Language Models (LLMs) have transformed how people interact with artificial intelligence (AI) systems.
We propose IdeaBench, a benchmark system that includes a comprehensive dataset and an evaluation framework.
Our dataset comprises titles and abstracts from a diverse range of influential papers, along with their referenced works.
Our evaluation framework is a two-stage process: first, using GPT-4o to rank ideas based on user-specified quality indicators such as novelty and feasibility, enabling scalable personalization.
arXiv Detail & Related papers (2024-10-31T17:04:59Z) - InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation [79.09622602860703]
We introduce InsightBench, a benchmark dataset with three key features.
It consists of 100 datasets representing diverse business use cases such as finance and incident management.
Unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics.
arXiv Detail & Related papers (2024-07-08T22:06:09Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - ECBD: Evidence-Centered Benchmark Design for NLP [95.50252564938417]
We propose Evidence-Centered Benchmark Design (ECBD), a framework which formalizes the benchmark design process into five modules.
Each module requires benchmark designers to describe, justify, and support benchmark design choices.
Our analysis reveals common trends in benchmark design and documentation that could threaten the validity of benchmarks' measurements.
arXiv Detail & Related papers (2024-06-13T00:59:55Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - DataPerf: Benchmarks for Data-Centric AI Development [81.03754002516862]
DataPerf is a community-led benchmark suite for evaluating ML datasets and data-centric algorithms.
We provide an open, online platform with multiple rounds of challenges to support this iterative development.
The benchmarks, online evaluation platform, and baseline implementations are open source.
arXiv Detail & Related papers (2022-07-20T17:47:54Z) - A Benchmarking Study of Embedding-based Entity Alignment for Knowledge
Graphs [30.296238600596997]
Entity alignment seeks to find entities in different knowledge graphs that refer to the same real-world object.
Recent advancement in KG embedding impels the advent of embedding-based entity alignment.
We survey 23 recent embedding-based entity alignment approaches and categorize them based on their techniques and characteristics.
arXiv Detail & Related papers (2020-03-10T05:32:06Z) - Benchmarking Graph Neural Networks [75.42159546060509]
Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs.
For any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress.
GitHub repository has reached 1,800 stars and 339 forks, which demonstrates the utility of the proposed open-source framework.
arXiv Detail & Related papers (2020-03-02T15:58:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.