Fugu-MT 論文翻訳(概要): MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task

論文の概要: MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task

arxiv url: http://arxiv.org/abs/2510.24707v1
Date: Tue, 28 Oct 2025 17:56:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-29 15:35:37.332669
Title: MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task
Title（参考訳）: MetricX-25とGemSpanEval: Googleが送信をWMT25評価共有タスクに移行
Authors: Juraj Juraska, Tobias Domhan, Mara Finkelstein, Tetsuji Nakagawa, Geza Kovacs, Daniel Deutsch, Pidong Wang, Markus Freitag,
Abstract要約: We present our submits to the unified WMT25 Translation Evaluation Shared Task。 The Quality Score Prediction subtask, we create a new generation of MetricX with improve in the input format and the training protocol。 Error Span Detection subtaskでは,その重大さやカテゴリとともにエラー幅を予測するために,GemSpanEvalという新しいモデルを開発した。
参考スコア（独自算出の注目度）: 20.03717974553634
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we present our submissions to the unified WMT25 Translation Evaluation Shared Task. For the Quality Score Prediction subtask, we create a new generation of MetricX with improvements in the input format and the training protocol, while for the Error Span Detection subtask we develop a new model, GemSpanEval, trained to predict error spans along with their severities and categories. Both systems are based on the state-of-the-art multilingual open-weights model Gemma 3, fine-tuned on publicly available WMT data. We demonstrate that MetricX-25, adapting Gemma 3 to an encoder-only architecture with a regression head on top, can be trained to effectively predict both MQM and ESA quality scores, and significantly outperforms its predecessor. Our decoder-only GemSpanEval model, on the other hand, we show to be competitive in error span detection with xCOMET, a strong encoder-only sequence-tagging baseline. With error span detection formulated as a generative task, we instruct the model to also output the context for each predicted error span, thus ensuring that error spans are identified unambiguously.
Abstract（参考訳）: 本稿では,統合されたWMT25翻訳評価共有タスクについて提案する。 The Quality Score Prediction subtask, we create a new generation of MetricX with improve in the input format and the training protocol, while the Error Span Detection subtask, we developed a new model, GemSpanEval。どちらのシステムも、最先端の多言語オープンウェイトモデル Gemma 3 に基づいており、公開されているWMTデータに基づいて微調整されている。 Gemma 3をレグレッションヘッドを持つエンコーダのみのアーキテクチャに適応させるMetricX-25は、MQMとESAの品質スコアの両方を効果的に予測する訓練が可能で、前者よりも大幅に優れています。一方、デコーダのみのGemSpanEvalモデルは、強いエンコーダのみのシーケンスタグ付けベースラインであるxCOMETとエラースパン検出において競合することを示す。生成タスクとして定式化されたエラースパン検出により、予測されたエラースパンのコンテキストも出力するようにモデルに指示し、エラースパンが曖昧に識別されるようにする。

論文の概要: MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task

関連論文リスト