Fugu-MT 論文翻訳(概要): Discovering Non-monotonic Autoregressive Orderings with Variational Inference

論文の概要: Discovering Non-monotonic Autoregressive Orderings with Variational Inference

arxiv url: http://arxiv.org/abs/2110.15797v1
Date: Wed, 27 Oct 2021 16:08:09 GMT
ステータス: 翻訳完了
システム内更新日: 2021-11-01 13:42:38.049526
Title: Discovering Non-monotonic Autoregressive Orderings with Variational Inference
Title（参考訳）: 変分推論による非単調自己回帰順序の発見
Authors: Xuanlin Li, Brandon Trabucco, Dong Huk Park, Michael Luo, Sheng Shen, Trevor Darrell, Yang Gao
Abstract要約: 我々は、訓練データから高品質な生成順序を純粋に検出する、教師なし並列化可能な学習装置を開発した。エンコーダを非因果的注意を持つトランスフォーマーとして実装し、1つのフォワードパスで置換を出力する。言語モデリングタスクにおける経験的結果から,我々の手法は文脈認識であり,一定の順序と競合する,あるいはより優れた順序を見つけることができる。
参考スコア（独自算出の注目度）: 67.27561153666211
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The predominant approach for language modeling is to process sequences from left to right, but this eliminates a source of information: the order by which the sequence was generated. One strategy to recover this information is to decode both the content and ordering of tokens. Existing approaches supervise content and ordering by designing problem-specific loss functions and pre-training with an ordering pre-selected. Other recent works use iterative search to discover problem-specific orderings for training, but suffer from high time complexity and cannot be efficiently parallelized. We address these limitations with an unsupervised parallelizable learner that discovers high-quality generation orders purely from training data -- no domain knowledge required. The learner contains an encoder network and decoder language model that perform variational inference with autoregressive orders (represented as permutation matrices) as latent variables. The corresponding ELBO is not differentiable, so we develop a practical algorithm for end-to-end optimization using policy gradients. We implement the encoder as a Transformer with non-causal attention that outputs permutations in one forward pass. Permutations then serve as target generation orders for training an insertion-based Transformer language model. Empirical results in language modeling tasks demonstrate that our method is context-aware and discovers orderings that are competitive with or even better than fixed orders.
Abstract（参考訳）: 言語モデリングの主要なアプローチは、シーケンスを左から右に処理することだが、これは、シーケンスが生成される順序である情報のソースを排除している。この情報を復元するための1つの戦略は、トークンの内容と順序の両方をデコードすることである。既存のアプローチでは、問題固有の損失関数を設計し、事前選択した順序で事前トレーニングすることで、コンテンツと順序を監督する。その他の最近の研究では、反復探索を用いて、トレーニングのための問題固有の順序を見つけるが、高い時間の複雑さに苦しめられ、効率的に並列化できない。これらの制限に対処するため、教師なしの並列化可能な学習者が、訓練データから純粋に高品質な生成順序を発見する。学習者は、遅延変数として自己回帰順序(置換行列として表される)で変分推論を行うエンコーダネットワークとデコーダ言語モデルを含む。対応するELBOは微分可能ではないため,ポリシー勾配を用いたエンドツーエンド最適化のための実用的なアルゴリズムを開発した。エンコーダを非因果的注意を持つトランスフォーマーとして実装し、1つのフォワードパスで置換を出力する。置換は挿入ベースのTransformer言語モデルをトレーニングするためのターゲット生成命令として機能する。言語モデリングタスクにおける経験的結果から,我々の手法は文脈認識であり,一定の順序と競合する,あるいはより優れた順序を見つける。

論文の概要: Discovering Non-monotonic Autoregressive Orderings with Variational Inference

関連論文リスト