Fugu-MT 論文翻訳(概要): Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention

論文の概要: Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention

arxiv url: http://arxiv.org/abs/2103.15722v1
Date: Mon, 29 Mar 2021 16:09:00 GMT
ステータス: 翻訳完了
システム内更新日: 2021-03-30 15:17:48.381248
Title: Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention
Title（参考訳）: 残ガウスに基づく自己注意を用いたトランスフォーマーに基づくエンドツーエンド音声認識
Authors: Chengdong Liang, Menglong Xu, Xiao-Lei Zhang
Abstract要約: 相対姿勢認識自己注意を導入する。セルフアテンションのグローバルレンジ依存性モデリング能力を維持します。また、局所性モデリング能力も向上する。 RPSA, GSA, resGSAをトランスフォーマーに基づく音声認識に適用する。
参考スコア（独自算出の注目度）: 9.709229853995987
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-attention (SA), which encodes vector sequences according to their pairwise similarity, is widely used in speech recognition due to its strong context modeling ability. However, when applied to long sequence data, its accuracy is reduced. This is caused by the fact that its weighted average operator may lead to the dispersion of the attention distribution, which results in the relationship between adjacent signals ignored. To address this issue, in this paper, we introduce relative-position-awareness self-attention (RPSA). It not only maintains the global-range dependency modeling ability of self-attention, but also improves the localness modeling ability. Because the local window length of the original RPSA is fixed and sensitive to different test data, here we propose Gaussian-based self-attention (GSA) whose window length is learnable and adaptive to the test data automatically. We further generalize GSA to a new residual Gaussian self-attention (resGSA) for the performance improvement. We apply RPSA, GSA, and resGSA to Transformer-based speech recognition respectively. Experimental results on the AISHELL-1 Mandarin speech recognition corpus demonstrate the effectiveness of the proposed methods. For example, the resGSA-Transformer achieves a character error rate (CER) of 5.86% on the test set, which is relative 7.8% lower than that of the SA-Transformer. Although the performance of the proposed resGSA-Transformer is only slightly better than that of the RPSA-Transformer, it does not have to tune the window length manually.
Abstract（参考訳）: 一対の類似性に応じてベクトル列を符号化する自己注意(SA)は、強い文脈モデリング能力のために音声認識において広く用いられている。しかし、長いシーケンスデータに適用すると精度が低下する。これは、重み付け平均演算子が注意分布の分散につながる可能性があり、それによって隣接する信号間の関係が無視されるという事実によって引き起こされる。本稿では,相対配置認識自己注意(RPSA)について述べる。自己注意のグローバルレンジ依存性モデリング能力を維持するだけでなく、局所性モデリング能力も向上する。元のRPSAのローカルウィンドウ長は、異なるテストデータに対して固定され敏感であるため、ウィンドウ長が学習可能で、テストデータに自動的に適応するガウスベースの自己アテンション(GSA)を提案する。さらに,GSAを新たな残留ガウス自己注意(resGSA)に一般化し,性能改善を図る。 rpsa, gsa, resgsaをそれぞれ変圧器に基づく音声認識に適用する。 aishell-1 mandarin音声認識コーパスの実験結果は,提案手法の有効性を示す。例えば、resGSA-Transformerはテストセット上で5.86%の文字誤り率(CER)を達成する。提案したresGSA-Transformerの性能はRPSA-Transformerよりもわずかに優れているが、手動でウィンドウ長を調整する必要はない。

関連論文リスト

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models [84.8919069953397]
Self-Taught Recognizer (STAR) は、音声認識システムのための教師なし適応フレームワークである。その結果,STARは14のドメインで平均13.5%の単語誤り率の相対的な減少を実現していることがわかった。 STARは1時間以内のラベル付きデータを必要とする高いデータ効率を示す。
論文参考訳（メタデータ） (2024-05-23T04:27:11Z)
When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants [39.00433193973159]
本研究は,テキスト,音声,視覚にまたがる自己注意型トランスフォーマーの効率性に関する,最初の統一的研究である。効率の良いトランスフォーマー変種がバニラモデルよりも効率的になる入力長閾値(タイピング点)を同定する。そこで本研究では,L-HuBERTを導入した。L-HuBERT,L-HuBERT,L-HuBERT,L-HuBERT,L-HuBERT,L-HuBERT,L-HuBERT,L-HuBERT,L-HuBERT,L-HuBERT,L-HuBERT,L -HuBERT,L-H
論文参考訳（メタデータ） (2023-06-14T17:59:02Z)
End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system [61.148549738631814]
エンドツーエンド(E2E)音声認識アーキテクチャは、従来の音声認識システムのすべてのコンポーネントを単一のモデルに組み立てる。これはASRシステムを単純化するが、文脈的ASRの欠点を導入している: E2Eモデルは、頻繁な固有名詞を含む発話に対して、より悪い性能を持つ。本稿では,文脈的単語認識能力を向上させるために,文脈バイアスアテンション(CBA)モジュールをアテンションベースエンコーダデコーダ(AED)モデルに追加することを提案する。
論文参考訳（メタデータ） (2022-02-18T03:26:02Z)
Voice Quality and Pitch Features in Transformer-Based Speech Recognition [3.921076451326107]
本研究では,トランスフォーマーに基づくASRモデルに対して,音声品質とピッチ特徴を完全かつ個別に取り入れることの効果について検討した。 We found mean Word Error Rate relative reductions to up 5.6% with the LibriSpeech benchmark。
論文参考訳（メタデータ） (2021-12-21T17:49:06Z)
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation [11.52842516726486]
本稿では,トランスエンコーダ層に時間削減層を組み込んだTransformer-based ASRモデルを提案する。また、自己知識蒸留(S-KD)を用いた事前学習型ASRモデルの微調整手法を導入し、ASRモデルの性能をさらに向上させる。言語モデル(LM)融合により、トランスフォーマーベースのASRモデルのための最新の単語誤り率(WER)結果を達成します。
論文参考訳（メタデータ） (2021-03-17T21:02:36Z)
Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
トランスフォーマーで表現される最先端のニューラルネットワークモデル(LM)は非常に複雑である。本稿では,トランスフォーマーLM推定のためのベイズ学習フレームワークを提案する。
論文参考訳（メタデータ） (2021-02-09T10:55:27Z)
Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement [26.596930749375474]
本稿では,異なるVAEアーキテクチャを時間的に切り換えるために,マルコフの依存関係を持つ潜在逐次変数を導入する。モデルのパラメータを推定し、音声信号を強化するために、対応する変動予測-最大化アルゴリズムを導出する。
論文参考訳（メタデータ） (2021-02-08T11:45:02Z)
Relative Positional Encoding for Speech Recognition and Direct Translation [72.64499573561922]
相対位置符号化方式を音声変換器に適用する。その結果,ネットワークは音声データに存在する変動分布に適応できることがわかった。
論文参考訳（メタデータ） (2020-05-20T09:53:06Z)
Weak-Attention Suppression For Transformer Based Speech Recognition [33.30436927415777]
Weak-Attention Suppression (WAS) を提案する。 We demonstrate that WAS leads to consistent Word Error Rate (WER) improve over strong transformer baselines。
論文参考訳（メタデータ） (2020-05-18T23:49:40Z)
Conformer: Convolution-augmented Transformer for Speech Recognition [60.119604551507805]
最近、トランスフォーマーと畳み込みニューラルネットワーク(CNN)に基づくモデルが、自動音声認識(ASR)の有望な結果を示している。音声認識のための畳み込み拡張変換器,Conformerを提案する。広く使われているLibriSpeechベンチマークでは、言語モデルを用いずにWERが2.1%/4.3%、テスト/テストの外部言語モデルで1.9%/3.9%を達成した。
論文参考訳（メタデータ） (2020-05-16T20:56:25Z)
Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition [66.47000813920619]
我々はLASOと呼ばれる非自己回帰型エンドツーエンド音声認識システムを提案する。非自己回帰性のため、LASOは他のトークンに依存することなくシーケンス内のテキストトークンを予測する。我々は,中国における公開データセットAISHELL-1の実験を行った。
論文参考訳（メタデータ） (2020-05-11T04:45:02Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。