Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation
- URL: http://arxiv.org/abs/2307.09701v1
- Date: Wed, 19 Jul 2023 01:05:33 GMT
- Title: Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation
- Authors: Hao Peng, Qingqing Cao, Jesse Dodge, Matthew E. Peters, Jared
Fernandez, Tom Sherborne, Kyle Lo, Sam Skjonsberg, Emma Strubell, Darrell
Plessas, Iz Beltagy, Evan Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi
- Abstract summary: Pentathlon is a benchmark for holistic and realistic evaluation of model efficiency.
Pentathlon focuses on inference, which accounts for a majority of the compute in a model's lifecycle.
It incorporates a suite of metrics that target different aspects of efficiency, including latency, throughput, memory overhead, and energy consumption.
- Score: 82.85015548989223
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rising computational demands of modern natural language processing (NLP)
systems have increased the barrier to entry for cutting-edge research while
posing serious environmental concerns. Yet, progress on model efficiency has
been impeded by practical challenges in model evaluation and comparison. For
example, hardware is challenging to control due to disparate levels of
accessibility across different institutions. Moreover, improvements in metrics
such as FLOPs often fail to translate to progress in real-world applications.
In response, we introduce Pentathlon, a benchmark for holistic and realistic
evaluation of model efficiency. Pentathlon focuses on inference, which accounts
for a majority of the compute in a model's lifecycle. It offers a
strictly-controlled hardware platform, and is designed to mirror real-world
applications scenarios. It incorporates a suite of metrics that target
different aspects of efficiency, including latency, throughput, memory
overhead, and energy consumption. Pentathlon also comes with a software library
that can be seamlessly integrated into any codebase and enable evaluation. As a
standardized and centralized evaluation platform, Pentathlon can drastically
reduce the workload to make fair and reproducible efficiency comparisons. While
initially focused on natural language processing (NLP) models, Pentathlon is
designed to allow flexible extension to other fields. We envision Pentathlon
will stimulate algorithmic innovations in building efficient models, and foster
an increased awareness of the social and environmental implications in the
development of future-generation NLP models.
Related papers
- Impacts of floating-point non-associativity on reproducibility for HPC and deep learning applications [0.0]
Run to run variability in parallel programs caused by floating-point non-associativity has been known to significantly affect algorithms.
We investigate the statistical properties of floating-point non-associativity within modern parallel programming models.
We examine the recently-added deterministic options in PyTorch within the context of GPU deployment for deep learning.
arXiv Detail & Related papers (2024-08-09T16:07:37Z) - Efficient Facial Landmark Detection for Embedded Systems [1.0878040851638]
This paper introduces the Efficient Facial Landmark Detection (EFLD) model, specifically designed for edge devices confronted with the challenges related to power consumption and time latency.
EFLD features a lightweight backbone and a flexible detection head, each significantly enhancing operational efficiency on resource-constrained devices.
We propose a cross-format training strategy to enhance the model's generalizability and robustness, without increasing inference costs.
arXiv Detail & Related papers (2024-07-14T14:49:20Z) - Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems [11.712948114304925]
Large language models (LLMs) in production can incur substantial costs.
We propose Etalon, a comprehensive performance evaluation framework that includes fluidity-index.
We also evaluate various open-source platforms and model-as-a-service offerings using Etalon.
arXiv Detail & Related papers (2024-07-09T16:13:26Z) - Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation [2.3636539018632616]
This work empirically investigates the optimization of complex deep learning models to analyze their functionality on an embedded device.
It evaluates the effectiveness of the optimized models in terms of their inference speed for image classification and video action detection.
arXiv Detail & Related papers (2024-06-25T17:34:52Z) - Large Language Models to Enhance Bayesian Optimization [57.474613739645605]
We present LLAMBO, a novel approach that integrates the capabilities of Large Language Models (LLM) within Bayesian optimization.
At a high level, we frame the BO problem in natural language, enabling LLMs to iteratively propose and evaluate promising solutions conditioned on historical evaluations.
Our findings illustrate that LLAMBO is effective at zero-shot warmstarting, and enhances surrogate modeling and candidate sampling, especially in the early stages of search when observations are sparse.
arXiv Detail & Related papers (2024-02-06T11:44:06Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Cheaply Evaluating Inference Efficiency Metrics for Autoregressive
Transformer APIs [66.30706841821123]
Large language models (LLMs) power many state-of-the-art systems in natural language processing.
LLMs are extremely computationally expensive, even at inference time.
We propose a new metric for comparing inference efficiency across models.
arXiv Detail & Related papers (2023-05-03T21:51:42Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - HULK: An Energy Efficiency Benchmark Platform for Responsible Natural
Language Processing [76.38975568873765]
We introduce HULK, a multi-task energy efficiency benchmarking platform for responsible natural language processing.
We compare pretrained models' energy efficiency from the perspectives of time and cost.
arXiv Detail & Related papers (2020-02-14T01:04:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.