Fugu-MT 論文翻訳(概要): CochCeps-Augment: A Novel Self-Supervised Contrastive Learning Using Cochlear Cepstrum-based Masking for Speech Emotion Recognition

論文の概要: CochCeps-Augment: A Novel Self-Supervised Contrastive Learning Using Cochlear Cepstrum-based Masking for Speech Emotion Recognition

arxiv url: http://arxiv.org/abs/2402.06923v1
Date: Sat, 10 Feb 2024 11:13:13 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-13 18:46:28.029285
Title: CochCeps-Augment: A Novel Self-Supervised Contrastive Learning Using Cochlear Cepstrum-based Masking for Speech Emotion Recognition
Title（参考訳）: CochCeps-Augment: Cochlear Cepstrum-based Masking を用いた自己監督型コントラスト学習
Authors: Ioannis Ziogas, Hessa Alfalahi, Ahsan H. Khandoker, Leontios J. Hadjileontiadis
Abstract要約: CochCeps-Augmentは、音声表現の自己教師付きコントラスト学習のための、バイオインスパイアされたマスキング強化タスクである。以上の結果から,CochCeps-Augmentが音声感情認識解析の標準ツールとなる可能性が示唆された。
参考スコア（独自算出の注目度）: 5.974778743092437
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Self-supervised learning (SSL) for automated speech recognition in terms of its emotional content, can be heavily degraded by the presence noise, affecting the efficiency of modeling the intricate temporal and spectral informative structures of speech. Recently, SSL on large speech datasets, as well as new audio-specific SSL proxy tasks, such as, temporal and frequency masking, have emerged, yielding superior performance compared to classic approaches drawn from the image augmentation domain. Our proposed contribution builds upon this successful paradigm by introducing CochCeps-Augment, a novel bio-inspired masking augmentation task for self-supervised contrastive learning of speech representations. Specifically, we utilize the newly introduced bio-inspired cochlear cepstrogram (CCGRAM) to derive noise robust representations of input speech, that are then further refined through a self-supervised learning scheme. The latter employs SimCLR to generate contrastive views of a CCGRAM through masking of its angle and quefrency dimensions. Our experimental approach and validations on the emotion recognition K-EmoCon benchmark dataset, for the first time via a speaker-independent approach, features unsupervised pre-training, linear probing and fine-tuning. Our results potentiate CochCeps-Augment to serve as a standard tool in speech emotion recognition analysis, showing the added value of incorporating bio-inspired masking as an informative augmentation task for self-supervision. Our code for implementing CochCeps-Augment will be made available at: https://github.com/GiannisZgs/CochCepsAugment.
Abstract（参考訳）: 自動音声認識のための自己教師あり学習 (ssl) は, 音声の時間的, スペクトル的インフォメーション構造をモデル化する効率に影響を与え, 存在雑音によって大きく劣化する。近年,大規模な音声データセット上のSSLや,時間と周波数のマスキングなどの新たな音声固有のSSLプロキシタスクが出現し,画像拡張領域から引き出された古典的アプローチよりも優れたパフォーマンスを実現している。提案手法は,音声表現の自己教師付きコントラスト学習のための新しいマスキング強化タスクであるcochceps-augmentを導入することで,このパラダイムを生かした。具体的には,新たに導入されたバイオインスパイアされたchlear cepstrogram(CCGRAM)を用いて,入力音声の雑音の頑健な表現を導出し,さらに自己教師付き学習手法により洗練する。後者はSimCLRを使用して、CCGRAMの角度と待ち行列次元をマスキングすることで、コントラスト的なビューを生成する。感情認識k-emoconベンチマークデータセットに関する実験的なアプローチと検証は,話者に依存しないアプローチで,教師なし事前学習,線形探索,微調整を特徴とする。本研究は,コクセプ・オーグメンメントを音声感情認識分析の標準ツールとして活用し,バイオインスパイアされたマスキングを自己スーパービジョンのための情報強化タスクとして取り入れる付加価値を示した。 CochCeps-Augmentを実装するためのコードは、https://github.com/GiannisZgs/CochCepsAugmentで利用可能になります。

関連論文リスト

Introducing Semantics into Speech Encoders [91.37001512418111]
本研究では,大言語モデルからの意味情報をラベル付き音声書き起こしのない自己教師付き音声エンコーダに組み込む教師なしの手法を提案する。提案手法は,100時間以上のラベル付き音声書き起こしにおける教師あり手法と類似した性能を実現する。
論文参考訳（メタデータ） (2022-11-15T18:44:28Z)
Self-Supervised Learning for Speech Enhancement through Synthesis [5.924928860260821]
そこで本研究では,ボコーダが雑音表現を受け入れ,クリーンな音声を直接合成する方法を学習する,デノナイズドボコーダ(DeVo)アプローチを提案する。 10msのレイテンシとパフォーマンスの低下を最小限に抑えながら,ストリーミングオーディオ上で動作可能な因果バージョンを実証した。
論文参考訳（メタデータ） (2022-11-04T16:06:56Z)
SLICER: Learning universal audio representations using low-resource self-supervised pre-training [53.06337011259031]
ラベルなし音声データに事前学習エンコーダを組み込むための自己指導型学習手法を提案する。我々の主な目的は、多種多様な音声および非音声タスクにまたがる一般化が可能な音声表現を学習することである。
論文参考訳（メタデータ） (2022-11-02T23:45:33Z)
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining [138.86293836634323]
MaskCLIPは、新たに提案されたマスク付き自己蒸留をコントラスト言語イメージ事前学習に組み込む。 MaskCLIPは、言語エンコーダの誘導により、線形探索、微調整、ゼロショット性能において優れた結果が得られる。
論文参考訳（メタデータ） (2022-08-25T17:59:58Z)
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond [64.85076239939336]
視覚における自己教師付き学習(SSL)は、NLPと同様の軌道をとる可能性がある。マスク付き予測(例えばBERT)による生成前文タスクは、NLPにおけるデファクトスタンダードSSLプラクティスとなっている。マスク画像モデリングの成功により、マスキングオートエンコーダが復活した。
論文参考訳（メタデータ） (2022-07-30T09:59:28Z)
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training [102.14558233502514]
自己教師型学習(SSL)における事前学習のマズード予測は,音声認識における顕著な進歩をみせている。本稿では,自動音声認識(ASR)の性能向上のための2つの教師付きコードブック生成手法を提案する。
論文参考訳（メタデータ） (2022-06-21T06:08:30Z)
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? [86.53044183309824]
話者関連課題における自己教師型学習の成功につながる要因について検討する。 Voxceleb-1データセットにおける実験結果から,SVタスクに対するSSLの利点は,マスクによる予測損失,データスケール,モデルサイズの組み合わせによるものであることが示唆された。
論文参考訳（メタデータ） (2022-04-27T08:35:57Z)
Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation [56.264157127549446]
音声感情認識(SER)は、人間とコンピュータの相互作用において重要な役割を果たす課題である。 SERの主な課題の1つは、データの不足である。本稿では,スペクトログラム拡張と併用した移動学習戦略を提案する。
論文参考訳（メタデータ） (2021-08-05T10:39:39Z)
Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning [20.39971017940006]
音声シムCLRは、音声表現学習のための新しい自己教師型目標である。トレーニング中、SimCLRは生の音声とそのスペクトログラムに拡張を適用した。
論文参考訳（メタデータ） (2020-10-27T02:09:06Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。