CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines
- URL: http://arxiv.org/abs/2407.12797v1
- Date: Thu, 20 Jun 2024 21:36:00 GMT
- Title: CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines
- Authors: Wenbo Sun, Jiaqi Wang, Qiming Guo, Ziyu Li, Wenlu Wang, Rihan Hai,
- Abstract summary: We introduce CEBench, an open-source toolkit for benchmarking online large language models.
It focuses on the critical trade-offs between expenditure and effectiveness required for LLM deployments.
This capability supports crucial decision-making processes aimed at maximizing effectiveness while minimizing cost impacts.
- Score: 29.25579967636023
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online Large Language Model (LLM) services such as ChatGPT and Claude 3 have transformed business operations and academic research by effortlessly enabling new opportunities. However, due to data-sharing restrictions, sectors such as healthcare and finance prefer to deploy local LLM applications using costly hardware resources. This scenario requires a balance between the effectiveness advantages of LLMs and significant financial burdens. Additionally, the rapid evolution of models increases the frequency and redundancy of benchmarking efforts. Existing benchmarking toolkits, which typically focus on effectiveness, often overlook economic considerations, making their findings less applicable to practical scenarios. To address these challenges, we introduce CEBench, an open-source toolkit specifically designed for multi-objective benchmarking that focuses on the critical trade-offs between expenditure and effectiveness required for LLM deployments. CEBench allows for easy modifications through configuration files, enabling stakeholders to effectively assess and optimize these trade-offs. This strategic capability supports crucial decision-making processes aimed at maximizing effectiveness while minimizing cost impacts. By streamlining the evaluation process and emphasizing cost-effectiveness, CEBench seeks to facilitate the development of economically viable AI solutions across various industries and research fields. The code and demonstration are available in \url{https://github.com/amademicnoboday12/CEBench}.
Related papers
- A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs [74.35290684163718]
A primary challenge in large language model (LLM) development is their onerous pre-training cost.
This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by leveraging a small language model (SLM)
arXiv Detail & Related papers (2024-10-24T14:31:52Z) - EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.
We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.
Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z) - FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications.
FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z) - EventChat: Implementation and user-centric evaluation of a large language model-driven conversational recommender system for exploring leisure events in an SME context [0.9999629695552196]
Large language models (LLMs) present an enormous evolution in the strategic potential of conversational recommender systems (CRS)
Yet, research has predominantly focused upon technical frameworks to implement LLM-driven CRS, rather than end-user evaluations or strategic implications for firms.
We detail the design of an LLM-driven CRS in an SME setting, and its subsequent performance in the field using both objective system metrics and subjective user evaluations.
arXiv Detail & Related papers (2024-07-05T12:42:31Z) - Benchmarking Open-Source Language Models for Efficient Question Answering in Industrial Applications [0.0]
Large Language Models (LLMs) have demonstrated remarkable capabilities in tasks such as question answering (QA)
This paper presents a comprehensive benchmarking study comparing open-source LLMs with their non-open-source counterparts on the task of question answering.
arXiv Detail & Related papers (2024-06-19T17:11:51Z) - Assessing Economic Viability: A Comparative Analysis of Total Cost of Ownership for Domain-Adapted Large Language Models versus State-of-the-art Counterparts in Chip Design Coding Assistance [10.364901568556435]
This paper presents a comparative analysis of total cost of ownership (TCO) and performance between domain-adapted large language models (LLM) and state-of-the-art (SoTA) LLMs.
arXiv Detail & Related papers (2024-04-12T23:37:56Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - SMART: Automatically Scaling Down Language Models with Accuracy Guarantees for Reduced Processing Fees [21.801053526411415]
Large Language Models (LLMs) have significantly boosted performance in natural language processing (NLP) tasks.
The deployment of high-performance LLMs incurs substantial costs, primarily due to the increased number of parameters aimed at enhancing model performance.
We introduce SMART, a novel framework designed to minimize the inference costs of NLP tasks while ensuring sufficient result quality.
arXiv Detail & Related papers (2024-03-11T17:45:47Z) - MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT [87.4910758026772]
"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development.
This paper explores the "less is more" paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices.
arXiv Detail & Related papers (2024-02-26T18:59:03Z) - On Leveraging Large Language Models for Enhancing Entity Resolution: A Cost-efficient Approach [7.996010840316654]
We propose an uncertainty reduction framework using Large Language Models (LLMs) to improve entity resolution results.
LLMs capitalize on their advanced linguistic capabilities and a pay-as-you-go'' model that provides significant advantages to those without extensive data science expertise.
We show that our method is efficient and effective, offering promising applications in real-world tasks.
arXiv Detail & Related papers (2024-01-07T09:06:58Z) - FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large
Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution.
We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios.
We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.