Fugu-MT 論文翻訳(概要): An Overview on Machine Translation Evaluation

論文の概要: An Overview on Machine Translation Evaluation

arxiv url: http://arxiv.org/abs/2202.11027v1
Date: Tue, 22 Feb 2022 16:58:28 GMT
ステータス: 翻訳完了
システム内更新日: 2022-02-23 15:04:16.738000
Title: An Overview on Machine Translation Evaluation
Title（参考訳）: 機械翻訳評価の概要
Authors: Lifeng Han
Abstract要約: 機械翻訳(MT)はAIと開発の重要なタスクの1つとなっている。 MTの評価課題は,機械翻訳の質を評価するだけでなく,機械翻訳研究者にタイムリーなフィードバックを与えることである。本報告は,機械翻訳評価(MTE)の略歴,MTE研究手法の分類,最先端の進展について概説する。
参考スコア（独自算出の注目度）: 6.85316573653194
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Since the 1950s, machine translation (MT) has become one of the important tasks of AI and development, and has experienced several different periods and stages of development, including rule-based methods, statistical methods, and recently proposed neural network-based learning methods. Accompanying these staged leaps is the evaluation research and development of MT, especially the important role of evaluation methods in statistical translation and neural translation research. The evaluation task of MT is not only to evaluate the quality of machine translation, but also to give timely feedback to machine translation researchers on the problems existing in machine translation itself, how to improve and how to optimise. In some practical application fields, such as in the absence of reference translations, the quality estimation of machine translation plays an important role as an indicator to reveal the credibility of automatically translated target languages. This report mainly includes the following contents: a brief history of machine translation evaluation (MTE), the classification of research methods on MTE, and the the cutting-edge progress, including human evaluation, automatic evaluation, and evaluation of evaluation methods (meta-evaluation). Manual evaluation and automatic evaluation include reference-translation based and reference-translation independent participation; automatic evaluation methods include traditional n-gram string matching, models applying syntax and semantics, and deep learning models; evaluation of evaluation methods includes estimating the credibility of human evaluations, the reliability of the automatic evaluation, the reliability of the test set, etc. Advances in cutting-edge evaluation methods include task-based evaluation, using pre-trained language models based on big data, and lightweight optimisation models using distillation techniques.
Abstract（参考訳）: 1950年代以降、機械翻訳(MT)はAIと開発の重要なタスクの1つとなり、ルールベースの手法、統計手法、最近提案されたニューラルネットワークベースの学習方法など、様々な期間と開発段階を経験してきた。これらの段階的な飛躍は、MTの評価研究と開発であり、特に統計翻訳と神経翻訳研究における評価方法の重要な役割である。 MTの評価課題は,機械翻訳の品質を評価するだけでなく,機械翻訳自体に存在する問題,改善方法,最適化方法について,機械翻訳研究者にタイムリーなフィードバックを提供することである。参照翻訳の欠如など、いくつかの実用的な応用分野において、機械翻訳の品質推定は、自動翻訳対象言語の信頼性を明らかにする指標として重要な役割を果たす。本報告は, 機械翻訳評価(mte)の概要, mte研究手法の分類, 人的評価, 自動評価, 評価手法の評価(メタ評価)など, 最先端の進歩について概説する。手動による評価と自動評価には、参照翻訳と参照翻訳の独立な参加、従来のn-gram文字列マッチング、構文とセマンティクスを適用したモデル、ディープラーニングモデル、評価手法の評価には、人間の評価の信頼性、自動評価の信頼性、テストセットの信頼性などが含まれる。最先端評価手法の進歩には,タスクベース評価,ビッグデータに基づく事前学習言語モデル,蒸留技術を用いた軽量最適化モデルなどがある。

関連論文リスト

Automatic Evaluation Metrics for Document-level Translation: Overview, Challenges and Trends [12.73291001580361]
本稿ではまず,文書レベルの翻訳と評価の重要性について紹介する。次に、自動評価スキームとメトリクスの現状を詳細に分析する。本稿では,参照多様性の欠如,文レベルのアライメント情報への依存,バイアス,不正確性,解釈可能性の欠如など,現在の評価手法が直面する課題について考察する。
論文参考訳（メタデータ） (2025-04-21T02:08:42Z)
BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine Translation [4.651581292181871]
本稿では,テキストから翻訳の感覚距離を評価するための双方向意味に基づく評価手法を提案する。このアプローチでは、包括的な多言語百科事典BabelNetを用いる。 Factual analysis is a strong correlation between the average evaluations generated by our method and the human evaluations across various machine translation system for English- German language pair。
論文参考訳（メタデータ） (2024-03-06T08:02:21Z)
Convergences and Divergences between Automatic Assessment and Human Evaluation: Insights from Comparing ChatGPT-Generated Translation and Neural Machine Translation [1.6982207802596105]
本研究では,自動計測と人的評価の収束と相違について検討する。自動評価を行うには,DQF-MQMのエラータイプと6つのルーリックを人間の評価に組み込んだ4つの自動計測手法を用いる。その結果、高度な翻訳ツールの性能を評価する上で、人間の判断が不可欠であることが示された。
論文参考訳（メタデータ） (2024-01-10T14:20:33Z)
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQMは,大規模な言語モデルに対して,翻訳におけるエラーの識別と分類を求めるプロンプト技術である。テキスト内学習と微調整によるラベル付きデータの影響について検討する。次に, PaLM-2モデルを用いてAutoMQMを評価し, スコアのプロンプトよりも性能が向上することがわかった。
論文参考訳（メタデータ） (2023-08-14T17:17:21Z)
Learning Evaluation Models from Large Language Models for Sequence Generation [61.8421748792555]
本稿では,大規模言語モデルを用いた3段階評価モデルトレーニング手法を提案する。 SummEval ベンチマークによる実験結果から,CSEM は人間ラベルデータなしで評価モデルを効果的に訓練できることが示された。
論文参考訳（メタデータ） (2023-08-08T16:41:16Z)
Knowledge-Prompted Estimator: A Novel Approach to Explainable Machine Translation Assessment [20.63045120292095]
言語間機械翻訳(MT)の品質評価は,翻訳性能を評価する上で重要な役割を担っている。 GEMBAはLarge Language Models (LLMs) に基づく最初のMT品質評価尺度であり、システムレベルのMT品質評価において最先端(SOTA)を達成するために一段階のプロンプトを用いる。本稿では,KPE(Knowledge-Prompted Estor)という,難易度,トークンレベルの類似度,文レベルの類似度を含む3つのワンステッププロンプト技術を組み合わせたCoTプロンプト手法を提案する。
論文参考訳（メタデータ） (2023-06-13T01:18:32Z)
FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation [64.9546787488337]
本稿では、Few-shot Region-aware Machine Translationのための新しいデータセットと評価ベンチマークFRMTを提案する。このデータセットは、英語からポルトガル語と中国語の2つの地域変種へのプロの翻訳で構成されている。
論文参考訳（メタデータ） (2022-10-01T05:02:04Z)
Measuring Uncertainty in Translation Quality Evaluation (TQE) [62.997667081978825]
本研究は,翻訳テキストのサンプルサイズに応じて,信頼区間を精度良く推定する動機づけた研究を行う。我々はベルヌーイ統計分布モデリング (BSDM) とモンテカルロサンプリング分析 (MCSA) の手法を適用した。
論文参考訳（メタデータ） (2021-11-15T12:09:08Z)
Translation Quality Assessment: A Brief Survey on Manual and Automatic Methods [9.210509295803243]
手動判定基準と自動評価指標の両方を含む、翻訳品質評価(TQA)手法のハイレベルで簡潔な調査を紹介します。翻訳モデル研究者と品質評価研究者の両方にとって、この研究が資産になることを願っています。
論文参考訳（メタデータ） (2021-05-05T18:28:10Z)
Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach [84.02388020258141]
強化学習におけるオフポリシ評価に基づく人間評価スコア推定のための新しいフレームワークであるENIGMAを提案する。 ENIGMAはいくつかの事前収集された経験データしか必要としないため、評価中にターゲットポリシーとのヒューマンインタラクションは不要である。実験の結果,ENIGMAは人間の評価スコアと相関して既存手法よりも有意に優れていた。
論文参考訳（メタデータ） (2021-02-20T03:29:20Z)
Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics [64.88815792555451]
評価法は, 評価に用いる翻訳に非常に敏感であることを示す。本研究では,人的判断に対する自動評価基準の下で,性能改善をしきい値にする方法を開発した。
論文参考訳（メタデータ） (2020-06-11T09:12:53Z)
Unsupervised Quality Estimation for Neural Machine Translation [63.38918378182266]
既存のアプローチでは、大量の専門家アノテートデータ、計算、トレーニング時間が必要です。 MTシステム自体以外に、トレーニングや追加リソースへのアクセスが不要なQEに対して、教師なしのアプローチを考案する。我々は品質の人間の判断と非常によく相関し、最先端の教師付きQEモデルと競合する。
論文参考訳（メタデータ） (2020-05-21T12:38:06Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。