EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation
Metrics
- URL: http://arxiv.org/abs/2209.09593v2
- Date: Tue, 31 Oct 2023 15:27:43 GMT
- Title: EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation
Metrics
- Authors: Daniil Larionov, Jens Gr\"unwald, Christoph Leiter, Steffen Eger
- Abstract summary: We provide a comprehensive evaluation of efficiency for MT evaluation metrics.
We evaluate six (reference-free and reference-based) metrics across three MT datasets and examine 16 lightweight transformers.
- Score: 21.72262031588122
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficiency is a key property to foster inclusiveness and reduce environmental
costs, especially in an era of LLMs. In this work, we provide a comprehensive
evaluation of efficiency for MT evaluation metrics. Our approach involves
replacing computation-intensive transformers with lighter alternatives and
employing linear and quadratic approximations for alignment algorithms on top
of LLM representations. We evaluate six (reference-free and reference-based)
metrics across three MT datasets and examine 16 lightweight transformers. In
addition, we look into the training efficiency of metrics like COMET by
utilizing adapters. Our results indicate that (a) TinyBERT provides the optimal
balance between quality and efficiency, (b) CPU speed-ups are more substantial
than those on GPU; (c) WMD approximations yield no efficiency gains while
reducing quality and (d) adapters enhance training efficiency (regarding
backward pass speed and memory requirements) as well as, in some cases, metric
quality. These findings can help to strike a balance between evaluation speed
and quality, which is essential for effective NLG systems. Furthermore, our
research contributes to the ongoing efforts to optimize NLG evaluation metrics
with minimal impact on performance. To our knowledge, ours is the most
comprehensive analysis of different aspects of efficiency for MT metrics
conducted so far.
Related papers
- Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System [75.25394449773052]
Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving.
Yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods.
We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness.
arXiv Detail & Related papers (2024-10-10T17:00:06Z) - Impact of ML Optimization Tactics on Greener Pre-Trained ML Models [46.78148962732881]
This study aims to (i) analyze image classification datasets and pre-trained models, (ii) improve inference efficiency by comparing optimized and non-optimized models, and (iii) assess the economic impact of the optimizations.
We conduct a controlled experiment to evaluate the impact of various PyTorch optimization techniques (dynamic quantization, torch.compile, local pruning, and global pruning) to 42 Hugging Face models for image classification.
Dynamic quantization demonstrates significant reductions in inference time and energy consumption, making it highly suitable for large-scale systems.
arXiv Detail & Related papers (2024-09-19T16:23:03Z) - Evaluating Language Models for Efficient Code Generation [13.175840119811]
We introduce Differential Performance Evaluation (DPE) to reliably evaluate Large Language Models (LLMs)
DPE focuses on efficiency-demanding programming tasks and establishing an insightful compound metric for performance evaluation.
As a proof of concept, we use DPE to create EvalPerf, a benchmark with 121 performance-challenging coding tasks.
arXiv Detail & Related papers (2024-08-12T18:59:13Z) - A deeper look at depth pruning of LLMs [49.30061112976263]
Large Language Models (LLMs) are resource-intensive to train but more costly to deploy in production.
Recent work has attempted to prune blocks of LLMs based on cheap proxies for estimating block importance.
We show that adaptive metrics exhibit a trade-off in performance between tasks.
arXiv Detail & Related papers (2024-07-23T08:40:27Z) - Lower-Left Partial AUC: An Effective and Efficient Optimization Metric
for Recommendation [52.45394284415614]
We propose a new optimization metric, Lower-Left Partial AUC (LLPAUC), which is computationally efficient like AUC but strongly correlates with Top-K ranking metrics.
LLPAUC considers only the partial area under the ROC curve in the Lower-Left corner to push the optimization focus on Top-K.
arXiv Detail & Related papers (2024-02-29T13:58:33Z) - Federated Learning of Large Language Models with Parameter-Efficient
Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data.
The training process of Large Language Models (LLMs) generally incurs the update of significant parameters.
This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - MetricOpt: Learning to Optimize Black-Box Evaluation Metrics [21.608384691401238]
We study the problem of optimizing arbitrary non-differentiable task evaluation metrics such as misclassification rate and recall.
Our method, named MetricOpt, operates in a black-box setting where the computational details of the target metric are unknown.
We achieve this by learning a differentiable value function, which maps compact task-specific model parameters to metric observations.
arXiv Detail & Related papers (2021-04-21T16:50:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.