Fugu-MT 論文翻訳(概要): Data Troubles in Sentence Level Confidence Estimation for Machine Translation

論文の概要: Data Troubles in Sentence Level Confidence Estimation for Machine Translation

arxiv url: http://arxiv.org/abs/2010.13856v1
Date: Mon, 26 Oct 2020 19:20:29 GMT
ステータス: 翻訳完了
システム内更新日: 2022-10-02 18:05:56.672911
Title: Data Troubles in Sentence Level Confidence Estimation for Machine Translation
Title（参考訳）: 機械翻訳における文レベルの信頼度推定におけるデータ問題
Authors: Ciprian Chelba, Junpei Zhou, Yuezhang (Music) Li, Hideto Kazawa, Jeff Klingner, Mengmeng Niu
Abstract要約: 本稿では,性能スペクトルの上位で動作しているニューラルマシン翻訳モデルの信頼性推定の実現可能性について検討する。本稿では,翻訳品質の簡易な自己説明的評価指標として,文レベルの精度$SACC$を提案する。
参考スコア（独自算出の注目度）: 4.879462316849671
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The paper investigates the feasibility of confidence estimation for neural machine translation models operating at the high end of the performance spectrum. As a side product of the data annotation process necessary for building such models we propose sentence level accuracy $SACC$ as a simple, self-explanatory evaluation metric for quality of translation. Experiments on two different annotator pools, one comprised of non-expert (crowd-sourced) and one of expert (professional) translators show that $SACC$ can vary greatly depending on the translation proficiency of the annotators, despite the fact that both pools are about equally reliable according to Krippendorff's alpha metric; the relatively low values of inter-annotator agreement confirm the expectation that sentence-level binary labeling $good$ / $needs\ work$ for translation out of context is very hard. For an English-Spanish translation model operating at $SACC = 0.89$ according to a non-expert annotator pool we can derive a confidence estimate that labels 0.5-0.6 of the $good$ translations in an "in-domain" test set with 0.95 Precision. Switching to an expert annotator pool decreases $SACC$ dramatically: $0.61$ for English-Spanish, measured on the exact same data as above. This forces us to lower the CE model operating point to 0.9 Precision while labeling correctly about 0.20-0.25 of the $good$ translations in the data. We find surprising the extent to which CE depends on the level of proficiency of the annotator pool used for labeling the data. This leads to an important recommendation we wish to make when tackling CE modeling in practice: it is critical to match the end-user expectation for translation quality in the desired domain with the demands of annotators assigning binary quality labels to CE training data.
Abstract（参考訳）: 本稿では,性能スペクトルのハイエンドで動作するニューラルマシン翻訳モデルの信頼度推定の可能性について検討する。このようなモデル構築に必要なデータアノテーションプロセスの副産物として,翻訳品質の簡易な自己説明的評価指標として文レベルの精度$SACC$を提案する。 Experiments on two different annotator pools, one comprised of non-expert (crowd-sourced) and one of expert (professional) translators show that $SACC$ can vary greatly depending on the translation proficiency of the annotators, despite the fact that both pools are about equally reliable according to Krippendorff's alpha metric; the relatively low values of inter-annotator agreement confirm the expectation that sentence-level binary labeling $good$ / $needs\ work$ for translation out of context is very hard. SACC = 0.89$で動作する英語とスペイン語の翻訳モデルでは、0.95精度の「ドメイン内」テストセットで$good$の翻訳の0.5-0.6をラベル付けする自信の推定を導出できる。専門家アノテータプールへの切り替えは、SACC$を劇的に下げる:0.61$ for English- Spanish, measured on the exactly same data。これにより、CEモデルの動作点を0.9精度に下げると同時に、データ内の$$$$の変換の約0.20-0.25を正しくラベル付けします。 CEがデータのラベル付けに使用するアノテータプールの習熟度にどの程度依存しているかは驚きだ。 CEトレーニングデータにバイナリ品質ラベルを割り当てるアノテータの要求と、所望のドメインにおける翻訳品質に対するエンドユーザの期待とを一致させることが重要です。

関連論文リスト

Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation [55.73341401764367]
本稿では,合成QEデータの分散シフトを緩和する新しいフレームワークであるADSQEを紹介する。 ADSQEは、参照、すなわち翻訳監督信号を使用して、生成プロセスとアノテーションプロセスの両方をガイドする。実験によると、ADSQEはCOMETのようなSOTAベースラインを教師なしと教師なしの両方で上回っている。
論文参考訳（メタデータ） (2025-02-27T10:11:53Z)
Technical report on label-informed logit redistribution for better domain generalization in low-shot classification with foundation models [0.0]
信頼度校正は、基礎モデルに基づく現実世界の意思決定システムにおいて、新たな課題である。本研究では,微調整の際,不正分類を罰する損失目標に組み込んだペナルティを提案する。 CMP(textitconfidence misalignment penalty)と呼ぶ。
論文参考訳（メタデータ） (2025-01-29T11:54:37Z)
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts [57.53692236201343]
提案するマルチタスク補正MOEでは,専門家が音声・テキスト・言語・テキスト・視覚・テキスト・データセットの「専門家」になるよう訓練する。 NeKoはマルチタスクモデルとして文法とポストOCR補正を競合的に実行している。
論文参考訳（メタデータ） (2024-11-08T20:11:24Z)
Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors [80.22825549235556]
既存のアプローチでは、エラーの位置と型を同期的に考慮することはできない。我々はtextbf の追加と textbfomission エラーを予測するために FG-TED モデルを構築した。実験により,本モデルではエラータイプと位置の同時同定が可能であり,最先端の結果が得られた。
論文参考訳（メタデータ） (2023-02-17T16:20:33Z)
Extrinsic Evaluation of Machine Translation Metrics [78.75776477562087]
文レベルでの翻訳と翻訳の良さを区別する上で,自動尺度が信頼性が高いかどうかは不明である。我々は,3つの下流言語タスクにおいて,最も広く使用されているMTメトリクス(chrF,COMET,BERTScoreなど)のセグメントレベル性能を評価する。実験の結果,各指標は下流結果の外部評価と負の相関を示すことがわかった。
論文参考訳（メタデータ） (2022-12-20T14:39:58Z)
Rethink about the Word-level Quality Estimation for Machine Translation from Human Judgement [57.72846454929923]
ベンチマークデータセットであるemphHJQEを作成し、専門家翻訳者が不適切な翻訳語を直接アノテートする。本稿では,タグリファインメント戦略と木ベースのアノテーション戦略という2つのタグ補正戦略を提案し,TERベースの人工QEコーパスをemphHJQEに近づける。その結果,提案したデータセットは人間の判断と一致しており,また,提案したタグ補正戦略の有効性も確認できた。
論文参考訳（メタデータ） (2022-09-13T02:37:12Z)
Mismatching-Aware Unsupervised Translation Quality Estimation For Low-Resource Languages [6.049660810617423]
XLMRScoreは、XLM-RoBERTa (XLMR)モデルで計算されたBERTScoreの言語間対応である。 WMT21QE共有タスクの4つの低リソース言語対に対して提案手法を評価する。
論文参考訳（メタデータ） (2022-07-31T16:23:23Z)
Understanding and Mitigating the Uncertainty in Zero-Shot Translation [92.25357943169601]
ゼロショット翻訳の不確実性の観点から、オフターゲット問題を理解し、緩和することを目的としている。そこで本研究では,モデルトレーニングのためのトレーニングデータを認知するための,軽量かつ補完的な2つのアプローチを提案する。提案手法は,強いMNMTベースライン上でのゼロショット翻訳の性能を著しく向上させる。
論文参考訳（メタデータ） (2022-05-20T10:29:46Z)
Measuring Uncertainty in Translation Quality Evaluation (TQE) [62.997667081978825]
本研究は,翻訳テキストのサンプルサイズに応じて,信頼区間を精度良く推定する動機づけた研究を行う。我々はベルヌーイ統計分布モデリング (BSDM) とモンテカルロサンプリング分析 (MCSA) の手法を適用した。
論文参考訳（メタデータ） (2021-11-15T12:09:08Z)
Consider the Alternatives: Navigating Fairness-Accuracy Tradeoffs via Disqualification [7.9649015115693444]
多くの機械学習環境では、公平さと精度のデシダラタの間には固有の緊張関係がある。フェアネスと精度のトレードオフを推論する新しいフレームワークである$gamma$-disqualificationを紹介し,研究する。例えば$gamma$-disqualification($gamma$-disqualification,$gamma$-disqualification, $gamma$-disqualification, $gamma$-disqualification, $gamma$-disqualification, $gamma$-disqualification)は,学習方法のトレードオフや精度を比較できる。
論文参考訳（メタデータ） (2021-10-02T14:32:51Z)
Verdi: Quality Estimation and Error Detection for Bilingual [23.485380293716272]
Verdiはバイリンガルコーパスのための単語レベルおよび文レベルの後編集作業推定のための新しいフレームワークである。バイリンガルコーパスの対称性を活用し,NMT予測器にモデルレベル二重学習を適用した。我々の手法は競争の勝者を圧倒し、他の基準法よりも大きなマージンで上回る。
論文参考訳（メタデータ） (2021-05-31T11:04:13Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。