Fugu-MT 論文翻訳(概要): Semantically Consistent Data Augmentation for Neural Machine Translation via Conditional Masked Language Model

論文の概要: Semantically Consistent Data Augmentation for Neural Machine Translation via Conditional Masked Language Model

arxiv url: http://arxiv.org/abs/2209.10875v1
Date: Thu, 22 Sep 2022 09:19:08 GMT
ステータス: 翻訳完了
システム内更新日: 2022-09-23 13:15:34.819386
Title: Semantically Consistent Data Augmentation for Neural Machine Translation via Conditional Masked Language Model
Title（参考訳）: 条件付きマスキング言語モデルによるニューラルマシン翻訳のための意味論的一貫したデータ拡張
Authors: Qiao Cheng, Jin Huang, Yitao Duan
Abstract要約: 本稿では,ニューラルマシン翻訳のための新しいデータ拡張手法を提案する。本手法は条件付きマスケプド言語モデル(CMLM)に基づく。 CMLMは置換時にソースとターゲットの両方に条件付けすることで意味的整合性を高めることができることを示す。
参考スコア（独自算出の注目度）: 5.756426081817803
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces a new data augmentation method for neural machine translation that can enforce stronger semantic consistency both within and across languages. Our method is based on Conditional Masked Language Model (CMLM) which is bi-directional and can be conditional on both left and right context, as well as the label. We demonstrate that CMLM is a good technique for generating context-dependent word distributions. In particular, we show that CMLM is capable of enforcing semantic consistency by conditioning on both source and target during substitution. In addition, to enhance diversity, we incorporate the idea of soft word substitution for data augmentation which replaces a word with a probabilistic distribution over the vocabulary. Experiments on four translation datasets of different scales show that the overall solution results in more realistic data augmentation and better translation quality. Our approach consistently achieves the best performance in comparison with strong and recent works and yields improvements of up to 1.90 BLEU points over the baseline.
Abstract（参考訳）: 本稿では,言語内および言語間のセマンティック一貫性を高めるニューラルマシン翻訳のための新しいデータ拡張手法を提案する。本手法は条件付きマスキング言語モデル(cmlm, conditional masked language model)に基づくものである。 CMLMは文脈依存の単語分布を生成するための優れた手法であることを示す。特に,CMLMは置換時にソースとターゲットの両方に条件付けすることで意味的一貫性を保てることを示す。さらに,多様性を高めるために,単語を語彙上の確率分布に置き換えるデータ拡張のためのソフトワード置換という概念を取り入れた。異なるスケールの4つの翻訳データセットの実験は、全体的なソリューションがより現実的なデータ拡張とより良い翻訳品質をもたらすことを示している。提案手法は, 強靭かつ最近の研究と比較して常に最高の性能を達成し, ベースライン上の最大1.90 BLEU点の改善をもたらす。

関連論文リスト

Diversity-Oriented Data Augmentation with Large Language Models [9.548912625579947]
我々はtextbfunderline Di-textbfunderline 指向データ textbfunderlineAugmentation framework (textbfDoAug) を提案する。具体的には、多様性指向の微調整手法を用いて、多彩なパラフレーズを生成することでテキストデータセットを増強できる多彩なパラフレーズとしてLLMを訓練する。その結果, ラベルの整合性を維持しつつ, 微調整LDMオーグメンタにより多様性が向上し, 下流タスクの堅牢性と性能が向上することがわかった。
論文参考訳（メタデータ） (2025-02-17T11:00:40Z)
Deterministic Reversible Data Augmentation for Neural Machine Translation [36.10695293724949]
本稿では,ニューラルネットワーク翻訳のための簡易かつ効果的なデータ拡張法であるDRDA(Deterministic Reversible Data Augmentation)を提案する。余分なコーパスやモデルの変更は必要ないため、DRDAはいくつかの翻訳タスクにおいて、明確なマージンで強いベースラインを上回ります。 DRDAはノイズ、低リソース、クロスドメインデータセットにおいて優れた堅牢性を示す。
論文参考訳（メタデータ） (2024-06-04T17:39:23Z)
Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
言語間のセマンティックパーシングは、高いソース言語(例えば英語)から少ないトレーニングデータを持つ低リソース言語へのパーシング能力を伝達する。そこで本稿では,最適輸送を用いた係り受け変数間の言語間相違を明示的に最小化することで,言語間セマンティック解析のための新しい手法を提案する。
論文参考訳（メタデータ） (2023-07-09T04:52:31Z)
Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution [124.99894592871385]
本稿では,従来の言語モデルと最近の言語モデルの両方を用いた語彙置換手法の大規模比較研究を行う。目的語に関する情報を適切に注入すれば,SOTA LMs/MLMsによるすでに競合する結果がさらに大幅に改善できることを示す。
論文参考訳（メタデータ） (2022-06-07T16:16:19Z)
Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation [50.54059385277964]
CsaNMT(Continuous Semantic Augmentation)と呼ばれる新しいデータ拡張パラダイムを提案する。 CsaNMTは各トレーニングインスタンスを、同じ意味の下で適切なリテラル式をカバーできる隣接領域で拡張する。
論文参考訳（メタデータ） (2022-04-14T08:16:28Z)
Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation [49.916963624249355]
UNMTモデルは、翻訳されたソースと推論中の自然言語で擬似並列データに基づいて訓練される。トレーニングと推論のソース差はUNMTモデルの翻訳性能を妨げている。本稿では、擬似並列データ自然言語を同時に用いたオンライン自己学習手法を提案する。
論文参考訳（メタデータ） (2022-03-16T04:50:27Z)
Improving Multilingual Translation by Representation and Gradient Regularization [82.42760103045083]
表現レベルと勾配レベルの両方でNMTモデルを正規化するための共同手法を提案する。提案手法は,オフターゲット翻訳の発生率の低減とゼロショット翻訳性能の向上に有効であることを示す。
論文参考訳（メタデータ） (2021-09-10T10:52:21Z)
On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
言語カバレッジバイアスは、ニューラルネットワーク翻訳(NMT)において重要である。実験を慎重に設計することにより、トレーニングデータにおける言語カバレッジバイアスの包括的分析を行う。本稿では,言語カバレッジバイアス問題を軽減するための,シンプルで効果的な2つのアプローチを提案する。
論文参考訳（メタデータ） (2021-06-07T01:55:34Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。