Fugu-MT 論文翻訳(概要): Improving speech recognition models with small samples for air traffic control systems

論文の概要: Improving speech recognition models with small samples for air traffic control systems

arxiv url: http://arxiv.org/abs/2102.08015v1
Date: Tue, 16 Feb 2021 08:28:52 GMT
ステータス: 翻訳完了
システム内更新日: 2021-02-17 14:57:01.127039
Title: Improving speech recognition models with small samples for air traffic control systems
Title（参考訳）: 航空交通制御システム用小型サンプルを用いた音声認識モデルの改善
Authors: Yi Lin, Qin Li, Bo Yang, Zhen Yan, Huachun Tan, and Zhengmao Chen
Abstract要約: 本研究では, 小さなトレーニングサンプルの課題に対処すべく, 事前学習とトランスファー学習に基づく新しいトレーニング手法を提案する。 3つの実際のATCデータセットを使用して、提案されたASRモデルとトレーニング戦略を検証する。実験の結果,ASRの性能は3つのデータセットで大幅に向上した。
参考スコア（独自算出の注目度）: 9.322392779428505
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the domain of air traffic control (ATC) systems, efforts to train a practical automatic speech recognition (ASR) model always faces the problem of small training samples since the collection and annotation of speech samples are expert- and domain-dependent task. In this work, a novel training approach based on pretraining and transfer learning is proposed to address this issue, and an improved end-to-end deep learning model is developed to address the specific challenges of ASR in the ATC domain. An unsupervised pretraining strategy is first proposed to learn speech representations from unlabeled samples for a certain dataset. Specifically, a masking strategy is applied to improve the diversity of the sample without losing their general patterns. Subsequently, transfer learning is applied to fine-tune a pretrained or other optimized baseline models to finally achieves the supervised ASR task. By virtue of the common terminology used in the ATC domain, the transfer learning task can be regarded as a sub-domain adaption task, in which the transferred model is optimized using a joint corpus consisting of baseline samples and new transcribed samples from the target dataset. This joint corpus construction strategy enriches the size and diversity of the training samples, which is important for addressing the issue of the small transcribed corpus. In addition, speed perturbation is applied to augment the new transcribed samples to further improve the quality of the speech corpus. Three real ATC datasets are used to validate the proposed ASR model and training strategies. The experimental results demonstrate that the ASR performance is significantly improved on all three datasets, with an absolute character error rate only one-third of that achieved through the supervised training. The applicability of the proposed strategies to other ASR approaches is also validated.
Abstract（参考訳）: 航空交通制御 (ATC) の分野において, 実践的自動音声認識 (ASR) モデルの訓練は, 音声サンプルの収集とアノテーションが専門的かつ領域依存的な作業であるため, 常に小さな訓練サンプルの問題に直面している。本研究では,この課題に対処するために,事前学習と伝達学習に基づく新しい学習手法を提案し,ATC領域におけるASRの具体的な課題に対処するために,エンドツーエンドのディープラーニングモデルを改善する。教師なしプリトレーニング戦略は、特定のデータセットのラベルなしのサンプルから音声表現を学ぶために最初に提案される。具体的には、一般的なパターンを失うことなくサンプルの多様性を改善するためのマスキング戦略が適用されます。その後、事前訓練または他の最適化されたベースラインモデルに転写学習を適用し、最終的に教師付きASRタスクを達成する。 ATCドメインで使用される共通用語により、転送学習タスクはサブドメイン適応タスクとみなすことができ、転送されたモデルは、ベースラインサンプルとターゲットデータセットから新たに転写されたサンプルからなるジョイントコーパスを用いて最適化される。この共同コーパス構築戦略は,小文字コーパスの問題に対処する上で重要なトレーニングサンプルのサイズと多様性を充実させるものである。さらに, 音声コーパスの品質向上のために, 新たな転写サンプルを増強するために, 速度摂動を適用した。 3つの実際のATCデータセットを使用して、提案されたASRモデルとトレーニング戦略を検証する。実験の結果,ASRの性能は3つのデータセットで有意に向上し,絶対的な文字誤り率の3分の1が教師付きトレーニングによって達成された。他のASRアプローチに対する提案された戦略の適用可能性も検証される。

関連論文リスト

Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
既存のリウェイト戦略は主にグループレベルのデータの重要性に焦点を当てている。動的・インスタンスレベルのデータ再重み付けのための新しいアルゴリズムを提案する。当社のフレームワークでは,冗長データや非形式データを優先的に再重み付けする戦略を考案することが可能です。
論文参考訳（メタデータ） (2025-02-10T17:57:15Z)
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs [73.74375912785689]
本稿では,音声認識システムのための統合学習戦略を提案する。 3つのタスクの1つのモデルをトレーニングすることで、VSRとAVSRの性能が向上することを示す。また,非ラベル標本をより効果的に活用するために,強欲な擬似ラベリング手法を導入する。
論文参考訳（メタデータ） (2024-11-04T16:46:53Z)
BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping [64.8477128397529]
本稿では,テスト時間適応フレームワークを提案する。我々は、インスタンスに依存しない履歴サンプルとインスタンスを意識したブースティングサンプルから特徴を検索するための軽量なキー値メモリを維持している。理論的には,本手法の背後にある合理性を正当化し,アウト・オブ・ディストリビューションとクロスドメイン・データセットの両方において,その有効性を実証的に検証する。
論文参考訳（メタデータ） (2024-10-20T15:58:43Z)
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
我々はData Adaptive Tracebackと呼ばれる新しい適応フレームワークを提案する。具体的には、ゼロショット法を用いて、事前学習データの最もダウンストリームなタスク関連サブセットを抽出する。我々は、擬似ラベルに基づく半教師付き手法を採用し、事前学習画像の再利用と、半教師付き学習における確証バイアス問題に対処するための視覚言語コントラスト学習手法を提案する。
論文参考訳（メタデータ） (2024-07-11T18:01:58Z)
ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
前回の研究であるRefTeacherは、疑似自信と注意に基づく監督を提供するために教師学生の枠組みを採用することで、この課題に取り組むための最初の試みである。このアプローチは、Transformerベースのパイプラインに従う現在の最先端のビジュアルグラウンドモデルと互換性がない。本稿では, ACTRESS を略したセミスーパービジョン視覚グラウンドのためのアクティブ・リトレーニング手法を提案する。
論文参考訳（メタデータ） (2024-07-03T16:33:31Z)
Iterative self-transfer learning: A general methodology for response time-history prediction based on small dataset [0.0]
本研究では,小さなデータセットに基づいてニューラルネットワークを学習するための反復的自己伝達学習手法を提案する。提案手法は,小さなデータセットに対して,ほぼ一桁の精度でモデル性能を向上させることができることを示す。
論文参考訳（メタデータ） (2023-06-14T18:48:04Z)
Adaptive Multi-Corpora Language Model Training for Speech Recognition [13.067901680326932]
本稿では,学習過程に沿って各コーパスのサンプリング確率を動的に学習・調整する適応型多コーパス学習アルゴリズムを提案する。静的サンプリング戦略のベースラインと比較すると,提案手法は顕著な改善をもたらす。
論文参考訳（メタデータ） (2022-11-09T06:54:50Z)
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptationは、ソースドメインでトレーニングされたモデルに適応して、テストサンプルの予測を改善することを目的としている。単一発話テスト時間適応 (SUTA) は音声領域における最初のTTA研究である。
論文参考訳（メタデータ） (2022-03-27T06:38:39Z)
ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems [15.527854608553824]
ATCSpeechNetは、航空交通制御システムにおけるコミュニケーション音声を人間可読テキストに変換する問題に取り組むために提案されている。特徴工学や辞書を使わずに、音声波形を直接テキストに変換するエンドツーエンドのパラダイムが開発されている。 ATCSpeech corpusの実験結果から,非常に小さなラベル付きコーパスを用いて,提案手法が高い性能を実現することが示された。
論文参考訳（メタデータ） (2021-02-17T02:27:09Z)
Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias [59.788358876316295]
そこで本研究では,小規模の法定フィールドデータセット上での話者検証を改善するためのパイプラインソリューションを提案する。大規模領域外データセットを活用することで,教師学習のための知識蒸留に基づく目的関数を提案する。提案する目的関数は,短時間の発話における教師学生の学習性能を効果的に向上できることを示す。
論文参考訳（メタデータ） (2020-09-21T00:58:40Z)
One-Shot Object Detection without Fine-Tuning [62.39210447209698]
本稿では,第1ステージのMatching-FCOSネットワークと第2ステージのStructure-Aware Relation Moduleからなる2段階モデルを提案する。また,検出性能を効果的に向上する新たなトレーニング戦略を提案する。提案手法は,複数のデータセット上で一貫した最先端のワンショット性能を上回る。
論文参考訳（メタデータ） (2020-05-08T01:59:23Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。