Related papers: Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation

Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation

URL: http://arxiv.org/abs/2307.09701v1
Date: Wed, 19 Jul 2023 01:05:33 GMT
Title: Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation
Authors: Hao Peng, Qingqing Cao, Jesse Dodge, Matthew E. Peters, Jared Fernandez, Tom Sherborne, Kyle Lo, Sam Skjonsberg, Emma Strubell, Darrell Plessas, Iz Beltagy, Evan Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi
Abstract summary: Pentathlon is a benchmark for holistic and realistic evaluation of model efficiency. Pentathlon focuses on inference, which accounts for a majority of the compute in a model's lifecycle. It incorporates a suite of metrics that target different aspects of efficiency, including latency, throughput, memory overhead, and energy consumption.
Score: 82.85015548989223
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Rising computational demands of modern natural language processing (NLP) systems have increased the barrier to entry for cutting-edge research while posing serious environmental concerns. Yet, progress on model efficiency has been impeded by practical challenges in model evaluation and comparison. For example, hardware is challenging to control due to disparate levels of accessibility across different institutions. Moreover, improvements in metrics such as FLOPs often fail to translate to progress in real-world applications. In response, we introduce Pentathlon, a benchmark for holistic and realistic evaluation of model efficiency. Pentathlon focuses on inference, which accounts for a majority of the compute in a model's lifecycle. It offers a strictly-controlled hardware platform, and is designed to mirror real-world applications scenarios. It incorporates a suite of metrics that target different aspects of efficiency, including latency, throughput, memory overhead, and energy consumption. Pentathlon also comes with a software library that can be seamlessly integrated into any codebase and enable evaluation. As a standardized and centralized evaluation platform, Pentathlon can drastically reduce the workload to make fair and reproducible efficiency comparisons. While initially focused on natural language processing (NLP) models, Pentathlon is designed to allow flexible extension to other fields. We envision Pentathlon will stimulate algorithmic innovations in building efficient models, and foster an increased awareness of the social and environmental implications in the development of future-generation NLP models.

Related papers

Energy Considerations of Large Language Model Inference and Efficiency Optimizations [28.55549828393871]
As large language models (LLMs) scale in size and adoption, their computational and environmental costs continue to rise. We systematically analyze the energy implications of common inference efficiency optimizations across diverse NLP and AI workloads. Our findings reveal that the proper application of relevant inference efficiency optimizations can reduce total energy use by up to 73% from unoptimized baselines.
arXiv Detail & Related papers (2025-04-24T15:45:05Z)
Explainable AI for building energy retrofitting under data scarcity [40.14307808809578]
This study presents an Artificial Intelligence (AI) and Machine Learning (ML)-based framework to recommend energy efficiency measures for residential buildings. Using Latvia as a case study, the methodology addresses challenges associated with limited datasets, class imbalance and data scarcity. The evaluation of the approach shows that it notably overcomes data limitations, achieving improvements up to 54% in precision, recall and F1 score.
arXiv Detail & Related papers (2025-04-08T14:00:08Z)
Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge [3.1471494780647795]
Recent trends show a growing focus on compact models-typically under 10 billion parameters-enabled by techniques such as quantization. This shift paves the way for LMs on edge devices, offering potential benefits such as enhanced privacy, reduced latency, and improved data sovereignty. We present a comprehensive evaluation of generative LM inference on representative CPU-based and GPU-accelerated edge devices.
arXiv Detail & Related papers (2025-03-12T07:01:34Z)
Impacts of floating-point non-associativity on reproducibility for HPC and deep learning applications [0.0]
Run to run variability in parallel programs caused by floating-point non-associativity has been known to significantly affect algorithms. We investigate the statistical properties of floating-point non-associativity within modern parallel programming models. We examine the recently-added deterministic options in PyTorch within the context of GPU deployment for deep learning.
arXiv Detail & Related papers (2024-08-09T16:07:37Z)
Efficient Facial Landmark Detection for Embedded Systems [1.0878040851638]
This paper introduces the Efficient Facial Landmark Detection (EFLD) model, specifically designed for edge devices confronted with the challenges related to power consumption and time latency. EFLD features a lightweight backbone and a flexible detection head, each significantly enhancing operational efficiency on resource-constrained devices. We propose a cross-format training strategy to enhance the model's generalizability and robustness, without increasing inference costs.
arXiv Detail & Related papers (2024-07-14T14:49:20Z)
Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems [11.712948114304925]
Large language models (LLMs) in production can incur substantial costs. We propose Etalon, a comprehensive performance evaluation framework that includes fluidity-index. We also evaluate various open-source platforms and model-as-a-service offerings using Etalon.
arXiv Detail & Related papers (2024-07-09T16:13:26Z)
Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation [2.3636539018632616]
This work empirically investigates the optimization of complex deep learning models to analyze their functionality on an embedded device. It evaluates the effectiveness of the optimized models in terms of their inference speed for image classification and video action detection.
arXiv Detail & Related papers (2024-06-25T17:34:52Z)
Large Language Models to Enhance Bayesian Optimization [57.474613739645605]
We present LLAMBO, a novel approach that integrates the capabilities of Large Language Models (LLM) within Bayesian optimization. At a high level, we frame the BO problem in natural language, enabling LLMs to iteratively propose and evaluate promising solutions conditioned on historical evaluations. Our findings illustrate that LLAMBO is effective at zero-shot warmstarting, and enhances surrogate modeling and candidate sampling, especially in the early stages of search when observations are sparse.
arXiv Detail & Related papers (2024-02-06T11:44:06Z)
QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights. We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z)
Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs [66.30706841821123]
Large language models (LLMs) power many state-of-the-art systems in natural language processing. LLMs are extremely computationally expensive, even at inference time. We propose a new metric for comparing inference efficiency across models.
arXiv Detail & Related papers (2023-05-03T21:51:42Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing [76.38975568873765]
We introduce HULK, a multi-task energy efficiency benchmarking platform for responsible natural language processing. We compare pretrained models' energy efficiency from the perspectives of time and cost.
arXiv Detail & Related papers (2020-02-14T01:04:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.