EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation
Metrics
- URL: http://arxiv.org/abs/2209.09593v2
- Date: Tue, 31 Oct 2023 15:27:43 GMT
- Title: EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation
Metrics
- Authors: Daniil Larionov, Jens Gr\"unwald, Christoph Leiter, Steffen Eger
- Abstract summary: We provide a comprehensive evaluation of efficiency for MT evaluation metrics.
We evaluate six (reference-free and reference-based) metrics across three MT datasets and examine 16 lightweight transformers.
- Score: 21.72262031588122
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficiency is a key property to foster inclusiveness and reduce environmental
costs, especially in an era of LLMs. In this work, we provide a comprehensive
evaluation of efficiency for MT evaluation metrics. Our approach involves
replacing computation-intensive transformers with lighter alternatives and
employing linear and quadratic approximations for alignment algorithms on top
of LLM representations. We evaluate six (reference-free and reference-based)
metrics across three MT datasets and examine 16 lightweight transformers. In
addition, we look into the training efficiency of metrics like COMET by
utilizing adapters. Our results indicate that (a) TinyBERT provides the optimal
balance between quality and efficiency, (b) CPU speed-ups are more substantial
than those on GPU; (c) WMD approximations yield no efficiency gains while
reducing quality and (d) adapters enhance training efficiency (regarding
backward pass speed and memory requirements) as well as, in some cases, metric
quality. These findings can help to strike a balance between evaluation speed
and quality, which is essential for effective NLG systems. Furthermore, our
research contributes to the ongoing efforts to optimize NLG evaluation metrics
with minimal impact on performance. To our knowledge, ours is the most
comprehensive analysis of different aspects of efficiency for MT metrics
conducted so far.
Related papers
- Adaptive Data Exploitation in Deep Reinforcement Learning [50.53705050673944]
We introduce ADEPT, a powerful framework to enhance the **data efficiency** and **generalization** in deep reinforcement learning (RL)
Specifically, ADEPT adaptively manages the use of sampled data across different learning stages via multi-armed bandit (MAB) algorithms.
We test ADEPT on benchmarks including Procgen, MiniGrid, and PyBullet.
arXiv Detail & Related papers (2025-01-22T04:01:17Z) - The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? [54.18519360412294]
Large Language Models (LLMs) must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility.
This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance.
Our resulting model, LibraChem, outperforms leading LLMs including Claude-3, GPT-4o, and LLaMA-3 by margins of 13.44%, 7.16%, and 7.10% respectively.
arXiv Detail & Related papers (2025-01-20T06:35:01Z) - Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings [1.5749416770494706]
Large language models (LLMs) have shown significant improvements in many natural language processing (NLP) tasks.
LLMs are resource-intensive, requiring extensive computational resources both during training and inference.
As their adoption accelerates, the sustainability of LLMs has become a critical issue.
arXiv Detail & Related papers (2025-01-14T16:02:33Z) - Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models [14.68920095399595]
sparsity-based PEFT (SPEFT) introduces trainable sparse adaptations to the weight matrices in the model.
We conduct the first systematic evaluation of salience metrics for SPEFT, inspired by zero-cost NAS proxies.
Our work challenges the notion that complexity is necessary for effective PEFT.
arXiv Detail & Related papers (2024-12-18T04:14:35Z) - Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System [75.25394449773052]
Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving.
Yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods.
We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness.
arXiv Detail & Related papers (2024-10-10T17:00:06Z) - Impact of ML Optimization Tactics on Greener Pre-Trained ML Models [46.78148962732881]
This study aims to (i) analyze image classification datasets and pre-trained models, (ii) improve inference efficiency by comparing optimized and non-optimized models, and (iii) assess the economic impact of the optimizations.
We conduct a controlled experiment to evaluate the impact of various PyTorch optimization techniques (dynamic quantization, torch.compile, local pruning, and global pruning) to 42 Hugging Face models for image classification.
Dynamic quantization demonstrates significant reductions in inference time and energy consumption, making it highly suitable for large-scale systems.
arXiv Detail & Related papers (2024-09-19T16:23:03Z) - A deeper look at depth pruning of LLMs [49.30061112976263]
Large Language Models (LLMs) are resource-intensive to train but more costly to deploy in production.
Recent work has attempted to prune blocks of LLMs based on cheap proxies for estimating block importance.
We show that adaptive metrics exhibit a trade-off in performance between tasks.
arXiv Detail & Related papers (2024-07-23T08:40:27Z) - Lower-Left Partial AUC: An Effective and Efficient Optimization Metric
for Recommendation [52.45394284415614]
We propose a new optimization metric, Lower-Left Partial AUC (LLPAUC), which is computationally efficient like AUC but strongly correlates with Top-K ranking metrics.
LLPAUC considers only the partial area under the ROC curve in the Lower-Left corner to push the optimization focus on Top-K.
arXiv Detail & Related papers (2024-02-29T13:58:33Z) - Federated Learning of Large Language Models with Parameter-Efficient
Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data.
The training process of Large Language Models (LLMs) generally incurs the update of significant parameters.
This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z) - MetricOpt: Learning to Optimize Black-Box Evaluation Metrics [21.608384691401238]
We study the problem of optimizing arbitrary non-differentiable task evaluation metrics such as misclassification rate and recall.
Our method, named MetricOpt, operates in a black-box setting where the computational details of the target metric are unknown.
We achieve this by learning a differentiable value function, which maps compact task-specific model parameters to metric observations.
arXiv Detail & Related papers (2021-04-21T16:50:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.