Related papers: xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation Metrics

xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation Metrics

URL: http://arxiv.org/abs/2406.14553v2
Date: Fri, 08 Nov 2024 15:50:51 GMT
Title: xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation Metrics
Authors: Daniil Larionov, Mikhail Seleznyov, Vasiliy Viskov, Alexander Panchenko, Steffen Eger,
Abstract summary: State-of-the-art machine translation evaluation metrics like xCOMET achieve high correlation with human judgment but rely on large encoders. We employ distillation, quantization, and pruning techniques to create efficient xCOMET alternatives. Our experiments show that, using quantization, xCOMET can be compressed up to three times with no quality degradation.
Score: 69.14652127492438
License: http://creativecommons.org/licenses/by/4.0/
Abstract: State-of-the-art trainable machine translation evaluation metrics like xCOMET achieve high correlation with human judgment but rely on large encoders (up to 10.7B parameters), making them computationally expensive and inaccessible to researchers with limited resources. To address this issue, we investigate whether the knowledge stored in these large encoders can be compressed while maintaining quality. We employ distillation, quantization, and pruning techniques to create efficient xCOMET alternatives and introduce a novel data collection pipeline for efficient black-box distillation. Our experiments show that, using quantization, xCOMET can be compressed up to three times with no quality degradation. Additionally, through distillation, we create an 278M-sized xCOMET-lite metric, which has only 2.6% of xCOMET-XXL parameters, but retains 92.1% of its quality. Besides, it surpasses strong small-scale metrics like COMET-22 and BLEURT-20 on the WMT22 metrics challenge dataset by 6.4%, despite using 50% fewer parameters. All code, dataset, and models are available online at https://github.com/NL2G/xCOMET-lite.

Related papers

MicroBi-ConvLSTM: An Ultra-Lightweight Efficient Model for Human Activity Recognition on Resource Constrained Devices [0.0]
Human Activity Recognition (HAR) on resource constrained wearables requires models that balance accuracy against strict memory and computational budgets.<n>We present MicroBi-ConvLSTM, an ultra-lightweight convolutional-recurrent architecture achieving 11.4K parameters on average.
arXiv Detail & Related papers (2026-02-06T09:26:29Z)
Fixed-Budget Parameter-Efficient Training with Frozen Encoders Improves Multimodal Chest X-Ray Classification [0.0]
Multimodal chest X-Ray analysis often fine-tunes large vision-language models, which is computationally costly.<n>We study parameter-efficient training strategies, including frozen encoders, BitFit, LoRA, and adapters for multi-label classification on the Indiana University Chest X-Ray dataset.
arXiv Detail & Related papers (2025-12-25T05:02:19Z)
Evaluating Embedding Models and Pipeline Optimization for AI Search Quality [0.0]
We evaluate the performance of various text embedding models and pipeline configurations for AI-driven search systems.<n>A custom evaluation dataset of 11,975 query-chunk pairs was synthesized from US City Council meeting transcripts.
arXiv Detail & Related papers (2025-11-27T09:09:39Z)
Dimension vs. Precision: A Comparative Analysis of Autoencoders and Quantization for Efficient Vector Retrieval on BEIR SciFact [0.0]
Int8 quantization provides the most effective "sweet spot," achieving a 4x compression with a negligible [1-2%] drop in nDCG@10.<n>Autoencoders show a graceful degradation but suffer a more significant performance loss at equivalent 4x compression ratios.<n> binary quantization was found to be unsuitable for this task due to catastrophic performance drops.
arXiv Detail & Related papers (2025-11-17T07:02:11Z)
Unifying Mixture of Experts and Multi-Head Latent Attention for Efficient Language Models [1.7272658301768147]
MoE-MLA-RoPE is a novel architecture combination that combines Mixture of Experts (MoE) with Multi-head Latent Attention (MLA) and Rotary Position Embeddings (RoPE) for efficient language modeling.<n>Our approach addresses the fundamental trade-off between model capacity and computational efficiency through three key innovations.
arXiv Detail & Related papers (2025-08-02T08:33:30Z)
LPASS: Linear Probes as Stepping Stones for vulnerability detection using compressed LLMs [0.0]
We show how Linear Probes can be used to provide an estimation on the performance of a compressed large language model.<n>We also show their suitability to set the cut-off point when applying layer pruning compression.<n>Our approach, dubbed $LPASS$, is applied in BERT and Gemma for the detection of 12 of MITRE's Top 25 most dangerous vulnerabilities on 480k C/C++ samples.
arXiv Detail & Related papers (2025-05-30T10:37:14Z)
EfficientLLM: Efficiency in Large Language Models [64.3537131208038]
Large Language Models (LLMs) have driven significant progress, yet their growing counts and context windows incur prohibitive compute, energy, and monetary costs.<n>We introduce EfficientLLM, a novel benchmark and the first comprehensive empirical study evaluating efficiency techniques for LLMs at scale.
arXiv Detail & Related papers (2025-05-20T02:27:08Z)
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression [55.323397702682506]
Post-training quantization (PTQ) reduces a model's memory footprint by mapping full precision weights into low bit weights without costly retraining. We develop a new mixed-precision PTQ approach, Task-Circuit Quantization (TaCQ), that draws parallels to automated circuit discovery.
arXiv Detail & Related papers (2025-04-10T02:19:03Z)
Ultra-Resolution Adaptation with Ease [62.56434979517156]
We propose a set of key guidelines for ultra-resolution adaptation termed emphURAE. We show that tuning minor components of the weight matrices outperforms widely-used low-rank adapters when synthetic data are unavailable. Experiments validate that URAE achieves comparable 2K-generation performance to state-of-the-art closed-source models like FLUX1.1 [Pro] Ultra with only 3K samples and 2K iterations.
arXiv Detail & Related papers (2025-03-20T16:44:43Z)
MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task [21.490930342296256]
We present the MetricX-24 submissions to the WMT24 Metrics Shared Task. Our primary submission is a hybrid reference-based/free metric. We show a significant performance increase over MetricX-23 on the WMT23 MQM ratings, as well as our new synthetic challenge set.
arXiv Detail & Related papers (2024-10-04T23:52:28Z)
Evaluating Automatic Metrics with Incremental Machine Translation Systems [55.78547133890403]
We introduce a dataset comprising commercial machine translations, gathered weekly over six years across 12 translation directions. We assume commercial systems improve over time, which enables us to evaluate machine translation (MT) metrics based on their preference for more recent translations.
arXiv Detail & Related papers (2024-07-03T17:04:17Z)
TernaryLLM: Ternarized Large Language Model [29.29122031050894]
Large language models (LLMs) have achieved remarkable performance on Natural Language Processing (NLP) tasks. We introduce Dual Learnable Ternarization (DLT), which enables both scales and shifts to be learnable. We also propose Outlier-Friendly Feature Knowledge Distillation (OFF) to recover the information lost in extremely low-bit quantization.
arXiv Detail & Related papers (2024-06-11T11:40:12Z)
Elucidating the Design Space of Dataset Condensation [23.545641118984115]
A concept within data-centric learning, dataset condensation efficiently transfers critical attributes from an original dataset to a synthetic version. We propose a comprehensive design framework that includes specific, effective strategies like implementing soft category-aware matching. In our testing, EDC achieves state-of-the-art accuracy, reaching 48.6% on ImageNet-1k with a ResNet-18 model at an IPC of 10, which corresponds to a compression ratio of 0.78%.
arXiv Detail & Related papers (2024-04-21T18:19:27Z)
Common 7B Language Models Already Possess Strong Math Capabilities [61.61442513067561]
This paper shows that the LLaMA-2 7B model with common pre-training already exhibits strong mathematical abilities. The potential for extensive scaling is constrained by the scarcity of publicly available math questions.
arXiv Detail & Related papers (2024-03-07T18:00:40Z)
Merging Experts into One: Improving Computational Efficiency of Mixture of Experts [71.44422347502409]
A sparse Mixture of Experts (MoE) can reduce the cost by activating a small subset of parameters. Can we retain the advantages of adding more experts without substantially increasing the computational costs? We propose a computation-efficient approach called textbftexttMerging Experts into One (MEO) which reduces the computation cost to that of a single expert.
arXiv Detail & Related papers (2023-10-15T13:28:42Z)
Examining Large Pre-Trained Language Models for Machine Translation: What You Don't Know About It [11.571189144910521]
Extra-large language models (xLPLMs) are proposed to claim supreme performances over smaller-sized PLMs. In this work, we examine if xLPLMs are absolutely superior to smaller-sized PLMs in fine-tuning toward domain-specific MTs.
arXiv Detail & Related papers (2022-09-15T16:12:26Z)
Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems [63.713297451300086]
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B. Their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system.
arXiv Detail & Related papers (2022-06-15T20:44:23Z)
Efficient Inference for Multilingual Neural Machine Translation [60.10996883354372]
We consider several ways to make multilingual NMT faster at inference without degrading its quality. Our experiments demonstrate that combining a shallow decoder with vocabulary filtering leads to more than twice faster inference with no loss in translation quality.
arXiv Detail & Related papers (2021-09-14T13:28:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.