Related papers: OpenPerf: A Benchmarking Framework for the Sustainable Development of the Open-Source Ecosystem

OpenPerf: A Benchmarking Framework for the Sustainable Development of the Open-Source Ecosystem

URL: http://arxiv.org/abs/2311.15212v1
Date: Sun, 26 Nov 2023 07:01:36 GMT
Title: OpenPerf: A Benchmarking Framework for the Sustainable Development of the Open-Source Ecosystem
Authors: Fenglin Bi, Fanyu Han, Shengyu Zhao, Jinlu Li, Yanbin Zhang, Wei Wang
Abstract summary: OpenPerf is a benchmarking framework designed for the sustainable development of the open-source ecosystem. We implement 3 data science task benchmarks, 2 index-based benchmarks, and 1 standard benchmark. We have developed a comprehensive toolkit for OpenPerf, which offers robust data management, tool integration, and user interface capabilities.
Score: 6.188178422139467
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Benchmarking involves designing scientific test methods, tools, and frameworks to quantitatively and comparably assess specific performance indicators of certain test subjects. With the development of artificial intelligence, AI benchmarking datasets such as ImageNet and DataPerf have gradually become consensus standards in both academic and industrial fields. However, constructing a benchmarking framework remains a significant challenge in the open-source domain due to the diverse range of data types, the wide array of research issues, and the intricate nature of collaboration networks. This paper introduces OpenPerf, a benchmarking framework designed for the sustainable development of the open-source ecosystem. This framework defines 9 task benchmarking tasks in the open-source research, encompassing 3 data types: time series, text, and graphics, and addresses 6 research problems including regression, classification, recommendation, ranking, network building, and anomaly detection. Based on the above tasks, we implemented 3 data science task benchmarks, 2 index-based benchmarks, and 1 standard benchmark. Notably, the index-based benchmarks have been adopted by the China Electronics Standardization Institute as evaluation criteria for open-source community governance. Additionally, we have developed a comprehensive toolkit for OpenPerf, which not only offers robust data management, tool integration, and user interface capabilities but also adopts a Benchmarking-as-a-Service (BaaS) model to serve academic institutions, industries, and foundations. Through its application in renowned companies and institutions such as Alibaba, Ant Group, and East China Normal University, we have validated OpenPerf's pivotal role in the healthy evolution of the open-source ecosystem.

Related papers

DMind Benchmark: The First Comprehensive Benchmark for LLM Evaluation in the Web3 Domain [4.419596289222511]
We introduce DMind Benchmark, a novel framework that systematically tests Large Language Models (LLMs) across nine key categories. DMind Benchmark goes beyond conventional multiple-choice questions by incorporating domain-specific subjective tasks. We evaluate fifteen popular LLMs on DMind Benchmark, uncovering performance gaps in Web3-specific reasoning and application.
arXiv Detail & Related papers (2025-04-18T16:40:39Z)
The BrowserGym Ecosystem for Web Agent Research [151.90034093362343]
BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents. We propose an extended BrowserGym-based ecosystem for web agent research, which unifies existing benchmarks from the literature. We conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across 6 popular web agent benchmarks.
arXiv Detail & Related papers (2024-12-06T23:43:59Z)
IdeaBench: Benchmarking Large Language Models for Research Idea Generation [19.66218274796796]
Large Language Models (LLMs) have transformed how people interact with artificial intelligence (AI) systems. We propose IdeaBench, a benchmark system that includes a comprehensive dataset and an evaluation framework. Our dataset comprises titles and abstracts from a diverse range of influential papers, along with their referenced works. Our evaluation framework is a two-stage process: first, using GPT-4o to rank ideas based on user-specified quality indicators such as novelty and feasibility, enabling scalable personalization.
arXiv Detail & Related papers (2024-10-31T17:04:59Z)
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation [79.09622602860703]
We introduce InsightBench, a benchmark dataset with three key features. It consists of 100 datasets representing diverse business use cases such as finance and incident management. Unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics.
arXiv Detail & Related papers (2024-07-08T22:06:09Z)
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z)
ECBD: Evidence-Centered Benchmark Design for NLP [95.50252564938417]
We propose Evidence-Centered Benchmark Design (ECBD), a framework which formalizes the benchmark design process into five modules. Each module requires benchmark designers to describe, justify, and support benchmark design choices. Our analysis reveals common trends in benchmark design and documentation that could threaten the validity of benchmarks' measurements.
arXiv Detail & Related papers (2024-06-13T00:59:55Z)
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z)
DataPerf: Benchmarks for Data-Centric AI Development [81.03754002516862]
DataPerf is a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We provide an open, online platform with multiple rounds of challenges to support this iterative development. The benchmarks, online evaluation platform, and baseline implementations are open source.
arXiv Detail & Related papers (2022-07-20T17:47:54Z)
A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs [30.296238600596997]
Entity alignment seeks to find entities in different knowledge graphs that refer to the same real-world object. Recent advancement in KG embedding impels the advent of embedding-based entity alignment. We survey 23 recent embedding-based entity alignment approaches and categorize them based on their techniques and characteristics.
arXiv Detail & Related papers (2020-03-10T05:32:06Z)
Benchmarking Graph Neural Networks [75.42159546060509]
Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. For any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. GitHub repository has reached 1,800 stars and 339 forks, which demonstrates the utility of the proposed open-source framework.
arXiv Detail & Related papers (2020-03-02T15:58:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.