Fugu-MT 論文翻訳(概要): GNCformer Enhanced Self-attention for Automatic Speech Recognition

論文の概要: GNCformer Enhanced Self-attention for Automatic Speech Recognition

arxiv url: http://arxiv.org/abs/2305.12755v1
Date: Mon, 22 May 2023 06:26:05 GMT
ステータス: 翻訳完了
システム内更新日: 2023-05-23 17:57:57.100494
Title: GNCformer Enhanced Self-attention for Automatic Speech Recognition
Title（参考訳）: 自動音声認識のためのgncフォーマ強化セルフアテンション
Authors: J. Li, Z. Duan, S. Li, X. Yu, G. Yang
Abstract要約: 頑健な特徴抽出のために、ESA(Enhanced Self-Attention)メカニズムが提案されている。本稿では,自動音声認識(ASR)タスクのためのトランスフォーマーネットワークのエンコーダ層にESAを組み込み,新たに提案したモデルをGNCformerと呼ぶ。 GNCformerの有効性は、Aishell-1とHKUSTの2つのデータセットを用いて検証されている。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper,an Enhanced Self-Attention (ESA) mechanism has been put forward for robust feature extraction.The proposed ESA is integrated with the recursive gated convolution and self-attention mechanism.In particular, the former is used to capture multi-order feature interaction and the latter is for global feature extraction.In addition, the location of interest that is suitable for inserting the ESA is also worth being explored.In this paper, the ESA is embedded into the encoder layer of the Transformer network for automatic speech recognition (ASR) tasks, and this newly proposed model is named GNCformer. The effectiveness of the GNCformer has been validated using two datasets, that are Aishell-1 and HKUST.Experimental results show that, compared with the Transformer network,0.8%CER,and 1.2%CER improvement for these two mentioned datasets, respectively, can be achieved.It is worth mentioning that only 1.4M additional parameters have been involved in our proposed GNCformer.
Abstract（参考訳）: In this paper,an Enhanced Self-Attention (ESA) mechanism has been put forward for robust feature extraction.The proposed ESA is integrated with the recursive gated convolution and self-attention mechanism.In particular, the former is used to capture multi-order feature interaction and the latter is for global feature extraction.In addition, the location of interest that is suitable for inserting the ESA is also worth being explored.In this paper, the ESA is embedded into the encoder layer of the Transformer network for automatic speech recognition (ASR) tasks, and this newly proposed model is named GNCformer. GNCformerの有効性は、Aishell-1とHKUSTの2つのデータセットを用いて検証されている。実験の結果、Transformerネットワークと比較して、これらの2つのデータセットに対してそれぞれ0.8%CERと1.2%CERの改善が達成できることが示されている。

関連論文リスト

DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps [30.53564087005569]
弱教師付きセマンティックセグメンテーション(WSSS)は、典型的には限定的なセマンティックアノテーションを使用して、初期クラスアクティベーションマップ(CAM)を取得する。クラスアクティベーション応答と高次元空間のセマンティック情報との結合が不十分なため、CAMはオブジェクト共起や不活性化の傾向にある。本稿では,意味認識重み行列を用いて埋め込み表現を再構成する新しい手法である,埋め込み情報のデュアル最適化であるDOEIを提案する。
論文参考訳（メタデータ） (2025-02-21T19:06:01Z)
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
本稿では,新しいdetextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) 機構を提案する。まず,非対称な特徴分離戦略を提案し,非対称的に学習プロセスを局所帰納バイアスと長距離依存に分解する。分離学習方式を採用し,特徴間の相補性を完全に活用することにより,高い効率性と精度を両立させることができる。
論文参考訳（メタデータ） (2024-11-25T07:56:13Z)
OminiControl: Minimal and Universal Control for Diffusion Transformer [68.3243031301164]
OminiControlは、イメージ条件をトレーニング済みのDiffusion Transformer(DiT)モデルに統合するフレームワークである。コアとなるOminiControlはパラメータ再利用機構を活用しており、強力なバックボーンとしてイメージ条件をエンコードすることができる。 OminiControlは、主観駆動生成や空間的に整合した条件を含む、幅広いイメージコンディショニングタスクを統一的に処理する。
論文参考訳（メタデータ） (2024-11-22T17:55:15Z)
Dual Aggregation Transformer for Image Super-Resolution [92.41781921611646]
画像SRのための新しいトランスモデルDual Aggregation Transformerを提案する。 DATは、ブロック間およびブロック内二重方式で、空間次元とチャネル次元にまたがる特徴を集約する。我々のDATは現在の手法を超越している。
論文参考訳（メタデータ） (2023-08-07T07:39:39Z)
Systematic Architectural Design of Scale Transformed Attention Condenser DNNs via Multi-Scale Class Representational Response Similarity Analysis [93.0013343535411]
マルチスケールクラス表現応答類似性分析(ClassRepSim)と呼ばれる新しいタイプの分析法を提案する。 ResNetスタイルのアーキテクチャにSTACモジュールを追加すると、最大1.6%の精度が向上することを示す。 ClassRepSim分析の結果は、STACモジュールの効果的なパラメータ化を選択するために利用することができ、競争性能が向上する。
論文参考訳（メタデータ） (2023-06-16T18:29:26Z)
Lightweight Vision Transformer with Bidirectional Interaction [63.65115590184169]
本研究では,視覚変換器の局所的・グローバル的情報をモデル化するためのFASA機構を提案する。 FASAに基づいて、我々はFAT(Fully Adaptive Transformer)ファミリーという軽量なビジョンバックボーンのファミリーを開発した。
論文参考訳（メタデータ） (2023-06-01T06:56:41Z)
Improving Transformer-based Networks With Locality For Automatic Speaker Verification [40.06788577864032]
話者埋め込み抽出のためのトランスフォーマーベースアーキテクチャが検討されている。本研究では,2方向の局所性モデルを用いてトランスフォーマーを改良する。本稿では,VoxCelebデータセットと大規模Microsoft内部多言語(MS-internal)データセットに対する提案手法の評価を行った。
論文参考訳（メタデータ） (2023-02-17T01:04:51Z)
Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking [59.79252390626194]
本稿ではTransSTAMという新しい手法を提案する。Transformerを利用して各オブジェクトの外観特徴とオブジェクト間の空間的時間的関係の両方をモデル化する。提案手法はMOT16, MOT17, MOT20を含む複数の公開ベンチマークで評価され, IDF1とHOTAの両方で明確な性能向上を実現している。
論文参考訳（メタデータ） (2022-05-31T01:19:18Z)
Miti-DETR: Object Detection based on Transformers with Mitigatory Self-Attention Convergence [17.854940064699985]
本稿では,緩和的自己認識機構を備えたトランスフォーマーアーキテクチャを提案する。 Miti-DETRは、各注意層の入力をそのレイヤの出力に予約し、「非注意」情報が注意伝播に関与するようにします。 Miti-DETRは、既存のDETRモデルに対する平均検出精度と収束速度を大幅に向上させる。
論文参考訳（メタデータ） (2021-12-26T03:23:59Z)
Attention-based Multi-hypothesis Fusion for Speech Summarization [83.04957603852571]
音声認識(ASR)とテキスト要約(TS)を組み合わせることで、音声要約を実現することができる ASR誤差はカスケード法における出力要約の品質に直接影響する。本稿では、ASRの誤りに対して頑健なカスケード音声要約モデルを提案し、ASRが生成した複数の仮説を利用して、ASRの誤りが要約に与える影響を緩和する。
論文参考訳（メタデータ） (2021-11-16T03:00:29Z)
Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition [45.858039215825656]
本稿では,グローバルな注意的局所再帰(GALR)ネットワークを採用し,生波形を直接入力とする新しいエンコーダを提案する。ベンチマークデータセットAISHELL-2と,5,000時間21,000時間の大規模マンダリン音声コーパスを用いて実験を行った。
論文参考訳（メタデータ） (2021-06-08T12:12:33Z)
AFD-Net: Adaptive Fully-Dual Network for Few-Shot Object Detection [8.39479809973967]
Few-shot Object Detection (FSOD) は、未確認の物体に迅速に適応できる検出器の学習を目的としている。既存の方法では、共有コンポーネントを用いて分類と局所化のサブタスクを実行することで、この問題を解決している。本稿では,2つのサブタスクの明示的な分解を考慮し,両者の情報を活用して特徴表現の強化を図ることを提案する。
論文参考訳（メタデータ） (2020-11-30T10:21:32Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。