Deep learning-based speech enhancement has seen huge improvements and
recently also expanded to full band audio (48 kHz). However, many approaches
have a rather high computational complexity and require big temporal buffers
for real time usage e.g. due to temporal convolutions or attention. Both make
those approaches not feasible on embedded devices. This work further extends
DeepFilterNet, which exploits harmonic structure of speech allowing for
efficient speech enhancement (SE). Several optimizations in the training
procedure, data augmentation, and network structure result in state-of-the-art
SE performance while reducing the real-time factor to 0.04 on a notebook
Core-i5 CPU. This makes the algorithm applicable to run on embedded devices in
real-time. The DeepFilterNet framework can be obtained under an open source
license.
However, many approaches have a rather high computational complexity and require big temporal buffers for real time usage e g due to temporal convolutions or attention.
Several optimizations in the training procedure, data augmentation, and network structure result in state-of-the-art SE performance while reducing the real-time factor to 0.04 on a notebook Core-i5 CPU.
Most SOTA methods perform SE in frequency domain by applying a short-time Fourier transform (STFT) to the noisy audio signal and enhance the signal in an U-Net like deep neural network (DNN).
However, many approaches have relatively large computational demands in terms of multiply-accumulate operations (MACs) and memory bandwidth.
しかし、多くのアプローチは、乗算累積演算(MAC)とメモリ帯域幅の点で比較的大きな計算要求がある。
0.75
That is, the higher sampling rate usually requires large FFT windows resulting in a high number of frequency bins which directly translates to a higher number of MACs.
Here, the frequency bins of the magnitude spectrogram are logarithmically compressed to 32 ERB bands.
ここで、マグニチュードスペクトログラムの周波数ビンは、32のerbバンドに対数圧縮される。
0.72
However, this only allows real-valued processing which is why PercepNet additionally applies a comb-filter for finer enhancement of periodic component of speech.
GaGNet [5], for instance, uses two so called glance and gaze stages after a feature extraction stage.
例えば、GaGNet[5]は、機能抽出段階の後に2つのいわゆる外見と視線ステージを使用します。
0.74
The glance module works on a coarse magnitude domain, while the gaze module processes the spectrum in complex domain allowing to reconstruct the spectrum at a finer resolution.
In this work we extend the work from [2] which also operates in two stages.
この作業では、2つの段階で動作する[2]から作業を拡張します。
0.71
DeepFilterNet takes advantage of the speech model consisting of a periodic and a stochastic component.
DeepFilterNetは周期成分と確率成分からなる音声モデルを利用する。
0.63
The first stage operates in ERB domain, only enhancing the speech envelope, while the second stage uses deep filtering [6, 7] to enhance the periodic component.
In this paper, we describe several optimizations resulting in SOTA performance on the Voicebank+Demand [8] and deep noise suppression (DNS) 4 blind test challenge dataset [9].
本稿では,voicebank+demand [8] と deep noise suppression (dns) 4 blind test challenge dataset [9] で sota 性能が向上したいくつかの最適化について述べる。
0.80
Moreover, these optimizations lead to an increased run-time performance, making it possible to run the model in real-time on a Raspberry Pi 4.
さらに、これらの最適化によって実行時のパフォーマンスが向上し、raspberry pi 4上でリアルタイムにモデルを実行することが可能になった。
0.70
2. METHODS 2.1.
2.方法 2.1.
0.49
Signal Model and the DeepFilterNet framework We assume noise and speech to be uncorrelated such as:
Signal ModelとDeepFilterNetフレームワーク ノイズと音声は次のような非相関であると仮定する。
0.71
x(t) = s(t) ∗ h(t) + n(t)
x(t) = s(t) ∗ h(t) + n(t)
0.43
(1) where s(t) is a clean speech signal, n(t) is an additive noise, and h(t) a room impulse response modeling the reverberant environment resulting in a noisy mixture x(t).
Fig. 1. Schematic overview of the DeepFilterNet2 speech enhancement process.
図1。 deepfilternet2音声強調処理の概要
0.37
stage operates in an compressed ERB domain which serves the purpose of reducing computational complexity while modeling auditory perception of the human ear.
Finally, to achieve faster convergence especially in the beginning of the training, we use batch scheduling [10] starting with a batch size of 8 and gradually increasing it to 96.
Fig. 2. Learning rate, weight decay and batch size scheduling used for training.
図2。 トレーニングに使用される学習率、体重減少、バッチサイズスケジューリング。
0.65
2.3. Multi-Target Loss We adopt the spectrogram loss Lspec from [2].
2.3. マルチターゲット損失 [2] からスペクトル損失 Lspec を採用する。
0.57
Additionally use a multi-resolution (MR) spectrogram loss where the enhancement spectrogram Y (k, f ) is first transformed into time-domain before computing multiple STFTs with windows from 5 ms to 40 ms [11].
To propagate the gradient for this loss, we use the pytorch STFT/ISTFT, which is numerically sufficiently close to the original DeepFilterNet processing loop implemented in Rust.
i where Y (cid:48) i = STFTi(y) is the i-th STFT with window sizes in {5, 10, 20, 40}ms of the predicted TD signal y, and c = 0.3 is a compression parameter [1].
私は ここで Y (cid:48) i = STFTi(y) は、予測されたTD信号 y の {5, 10, 20, 40}ms のウィンドウサイズを持つ i 番目の STFT であり、c = 0.3 は圧縮パラメータ [1] である。
0.68
Compared to DeepFilterNet [2], we drop the α loss term since the employed heuristic is only a poor approximation of the local speeech periodicity.
Also, DF may enhance speech in non-voiced sections and can disable its effect by setting the real part of the coefficient at t0 to 1 and the remaining coefficients to 0.
(7) 2.4. Data and Augmentation While DeepFilterNet was trained on the deep noise suppression (DNS) 3 challenge dataset [12], we train DeepFilterNet2 on the english part of DNS4 [9] which contains more fullband noise and speech samples.
(7) 2.4. Data and Augmentation DeepFilterNetは、ディープノイズ抑圧(DNS)3チャレンジデータセット[12]でトレーニングされていますが、よりフルバンドノイズと音声サンプルを含むDNS4[9]の英語部分でDeepFilterNet2をトレーニングしています。
0.52
In speech enhancement, usually only background noise and in some cases reverberation is reduced [1, 11, 2].
音声強調では、通常、背景雑音のみを低減し、場合によっては残響を減少させる[1, 11, 2]。
0.67
In this work, we further extended the SE concept to declipping.
この作業では、SEの概念をさらにデクリッピングに拡張しました。
0.60
Therefore, we distinguish between augmentations and distortions in the
Distortions, on the other hand, are only applied to speech samples for noisy mixture creation.
一方、歪みは雑音混合生成のための音声サンプルにのみ適用される。
0.63
The clean speech target is not affected by a distortion transform.
クリーンな音声ターゲットは歪み変換の影響を受けない。
0.63
Thus, the DNN learns to reconstruct the original, undistorted speech signal.
これにより、dnnは、原音声信号の復元を学習する。
0.61
Currently, the DeepFilterNet framework supports the following randomized augmentations:
現在、DeepFilterNetフレームワークは以下のランダム化拡張をサポートしている。
0.67
• Random 2nd order filtering [13] • Gain changes • Equalizer via 2nd order filters • Resampling for speed and pitch changes [13] • Addition of colored noise (not used for speech samples) Additionally to denoising, DeepFilterNet will try to revert the following distortions:
Convolutions for both ERB and complex features are now processed within the encoder, concatenated, and passed to a grouped linear (GLinear) layer and single GRU.
We also apply grouping in the output layer of the DF decoder with the incentive that the neighboring frequencies are sufficient for predicting the filter coefficients.
We adopt the post-filter, first proposed by Valin et al [1], with the aim of slightly over-attenuating noisy TF bins while adding some gain back to less noisy bins.
Valin et al [1] が最初に提案したポストフィルタは、雑音の少ない TF ビンに若干の利得を加えながら、わずかに過度に減衰することを目的としている。
0.65
We perform this on the predicted gains in the first stage:
最初の段階で予測利得でこれを実行します。
0.65
G(k, b) . (8)
G(k, b) . (8)
0.42
(cid:16) π G(cid:48)(k, b) ← G(k, b)) · sin G(k, b) ← (1 + β) · G(k, b) 1 + β + G(cid:48)(b, k)
(cid:16) π g(cid:48)(k, b) 〜 g(k, b) 〜 sin g(k, b) 〜 (1 + β) · g(k, b) 1 + β + g(cid:48)(b, k) 〜 sin g(k, b) 〜 (1 + β) · g(k, b) 1 + β + g(cid:48)(b, k)
0.59
2 (cid:17)
2 (cid:17)
0.41
3. EXPERIMENTS 3.1.
3.実験 3.1.
0.48
Implementation details As stated in section 2.4, we train DeepFilterNet2 on DNS4 dataset using overall more than 500 h of full-band clean speech, approx.
In this work, we use 20 ms windows, an overlap of 50 %, and a look-ahead of two frames resulting in an overall algorithmic delay of 40 ms. We take 32 ERB bands, fDF = 5 kHz, a DF order of N = 5, and a look-ahead l = 2 frames.
aMetrics and RTF measured with source code and weights provided at https://github.com/x iph/rnnoise/ bNote, that RNNoise runs single-threaded cRTF measured with source code provided at https://github.com/h uyanxin/DeepComplexC RN dComposite and STOI metrics provided by the same authors in [16] eMetrics and RTF measured with source code and weights provided at https://github.com/h it-thusz-RookieCJ/Fu llSubNet-plus fRTF measured with source code provided at https://github.com/A ndong-Li-speech/GaGN et/
The number of parameters has slightly increased over DeepFilterNet (Sec. 2.5), but the network is able to run more than twice as fast and achieves a 0.27 higher PESQ score.
GaGNet [5] achieves a similar RTF while having good SE performance.
GaGNet[5]は、SE性能が良く、同様のRTFを達成する。
0.78
However, it only runs fast when provided with the whole audio and requires large temporal buffers due to its usage of big temporal convolution kernels.
Table 2 shows DNSMOS P.835 [22] results on the DNS4 blind test set.
表2はDNS4ブラインドテストセットでDNSMOS P.835[22]の結果を示す。
0.76
While DeepFilterNet [2] was not able to enhance the speech quality mean opinion score (SIGMOS), with DeepFilterNet2 we obtain good results also for background and overall MOS values.
DeepFilterNet [2] は音声品質平均評価スコア(SIGMOS)を向上することはできなかったが、DeepFilterNet2 では背景および全体 MOS 値にも良い結果が得られた。
0.80
Moreover, DeepFilterNet2 comes relatively close to the minimum DNSMOS values that were used to select clean speech samples to train the DNS4 baseline NSNet2 (SIG=4.2, BAK=4.5, OVL=4.0) [9] further emphasizing its good SE performance.
In future work, we plan to extend the idea of speech enhancement to other enhancements, like correcting lowpass characteristics due to the current room environment.
[2] Hendrik Schr¨oter, Alberto N Escalante-B, Tobias Rosenkranz, and Andreas Maier, “DeepFilterNet: A low complexity speech enhancement framework for fullband audio based on deep filtering,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) において、DeepFilterNet: ディープフィルタに基づくフルバンドオーディオのための低複雑性音声拡張フレームワーク。 訳抜け防止モード: 2 ] ヘンドリク・シュル・ショテル, アルベルト・n・エスカランテ - b, tobias rosenkranz, andreas maier, andreas maier, “deepfilternet: a low complexity speech enhancement framework for fullband audio based on deep filtering, ” in ieee international conference on acoustics,” (英語) 音声・信号処理(icssp)。
0.73
IEEE, 2022.
IEEE、2022年。
0.76
[3] Shengkui Zhao, Bin Ma, Karn N Watcharasupat, and Woon-Seng Gan, “FRCRN: Boosting feature representation using frequency recurrence for monaural speech in IEEE International Conference on enhancement,” Acoustics, Speech and Signal Processing (ICASSP).
[3]Shengkui Zhao, Bin Ma, Karn N Watcharasupat, Woon-Seng Gan, “FRCRN: Boosting feature representation using frequency repeatence for monaural speech in IEEE International Conference on enhancement, Acoustics, Speech and Signal Processing (ICASSP)”。 訳抜け防止モード: [3]神国蔵王、ビンマ、カルン・N・ウォラササット、 Woon - Seng Gan, “FRCRN: Boosting feature representation using frequency repeatence for monaural speech in IEEE International Conference on enhancement” と題されている。 音響・音声・信号処理(ICASSP)
0.73
IEEE, 2022.
IEEE、2022年。
0.76
[4] Guochen Yu, Yuansheng Guan, Weixin Meng, Chengshi Zheng, and Hui Wang, “DMF-Net: A decoupling-style multi-band fusion model for real-time full-band speech enhancement,” arXiv preprint arXiv:2203.00472, 2022.
[4]Guochen Yu, Yuansheng Guan, Weixin Meng, Chengshi Zheng, Hui Wang, “DMF-Net: A decoupling-style multi-band fusion model for real-time full-band speech enhancement, arXiv preprint arXiv:2203.00472, 2022”。 訳抜け防止モード: [4]グーチェンユ、ユアンシェン・グアン、ワイシン・メン、 Chengshi ZhengとHui Wangは、"DMF - Net : A Decoupling -style multi- band fusion model for real-time full - band speech enhancement"と述べている。 arXiv preprint arXiv:2203.00472 , 2022。
0.66
[5] Andong Li, Chengshi Zheng, Lu Zhang, and Xiaodong Li, “Glance and gaze: A collaborative learning framework for single-channel speech enhancement,” Applied Acoustics, vol.
Andong Li, Chengshi Zheng, Lu Zhang, Xiaodong Li, “Glance and gaze: a collaborative learning framework for single- channel speech enhancement”, Applied Acoustics, vol。 訳抜け防止モード: [5 ]アンドン・リー、チェンジ・チェン、ル・チャン、 Xiaodong Li, “Glance and gaze: a collaborative learning framework for single- channel speech enhancement” 応用音響学専攻。
0.68
187, 2022.
187, 2022.
0.43
[6] Hendrik Schr¨oter, Tobias Rosenkranz, Alberto Escalante Banuelos, Marc Aubreville, and Andreas Maier, “CLCNet: Deep learning-based noise reduction for hearing aids using complex linear coding,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020.
6] hendrik schr soter, tobias rosenkranz, alberto escalante banuelos, marc aubreville, andreas maier, “clcnet: deep learning-based noise reduction for hearing aids using complex linear coding” ieee international conference on acoustics, speech and signal processing (icassp) 2020で発表された。 訳抜け防止モード: 6]ヘンドリク・シュル・ショテル、トビアス・ローゼンクランツ、アルベルト・エスカランテ・バヌエロス marc aubreville, andreas maier, andreas maier, "clcnet : deep learning-based noise reduction for hearing aids using complex linear coding" ieee international conference on acoustics, speech and signal processing (icassp)2020で発表。
0.67
[7] Wolfgang Mack and Emanu¨el AP Habets, “Deep Filtering: Signal Extraction and Reconstruction Using Complex Time-Frequency Filters,” IEEE Signal Processing Letters, vol.
ディープフィルタ:複雑な時間周波数フィルタを用いた信号抽出と再構成”IEEE Signal Processing Letters, vol. IEEE Signal Processing Letters.[7] Wolfgang MackとEmanu sel AP Habets。
0.81
27, 2020. [8] Cassia Valentini-Botinhao, Xin Wang, Shinji Takaki, and Junichi Yamagishi, “Investigating RNN-based speech enhancement methods for noise-robust Text-toSpeech,” in SSW, 2016.
[9] Harishchandra Dubey, Vishak Gopal, Ross Cutler, Ashkan Aazami, Sergiy Matusevych, Sebastian Braun, Sefik Emre Eskimez, Manthan Thakker, Takuya Yoshioka, Hannes Gamper, et al , “ICASSP 2022 deep in IEEE International noise suppression challenge,” Conference on Acoustics, Speech and Signal Processing (ICASSP).
9]Harishchandra Dubey, Vishak Gopal, Ross Cutler, Ashkan Aazami, Sergiy Matusevych, Sebastian Braun, Sefik Emre Eskimez, Manthan Thakker, Taakuya Yoshioka, Hannes Gamper, et al , “ICASSP 2022 deep in IEEE International noise suppress Challenge” Conference on Acoustics, Speech and Signal Processing (ICASSP)。
0.40
IEEE, 2022.
IEEE、2022年。
0.76
[10] Samuel L Smith, Pieter-Jan Kindermans, Chris Ying, and Quoc V Le, “Don’t decay the learning rate, increase the batch size,” arXiv preprint arXiv:1711.00489, 2017.
10] Samuel L Smith, Pieter-Jan Kindermans, Chris Ying, Quoc V Le, “学習率を損なうな,バッチサイズを増やせ” arXiv preprint arXiv:1711.00489, 2017 訳抜け防止モード: サミュエル・L・スミス (Samuel L Smith) 演 - ヤン・キンダーマンズ (Jan Kindermans) Chris Ying氏とQuoc V Le氏は,“学習速度を損なわないで下さい。 arXiv preprint arXiv:1711.00489, 2017
0.77
[11] Hyeong-Seok Choi, Sungjin Park, Jie Hwan Lee, Hoon Heo, Dongsuk Jeon, and Kyogu Lee, “Real-time denoising and dereverberation wtih tiny recurrent u-net,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP).
国際音響・音声・信号処理会議(ICASSP)において,[11]Hyeong-Seok Choi, Sungjin Park, Jie Hwan Lee, Hoon Heo, Dongsuk Jeon, and Kyogu Lee, “Real-time denoising and dereverberation wtih small recurrent u-net” と題して発表した。
0.88
IEEE, 2021.
IEEE、2021年。
0.81
[12] Chandan KA Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, and Sriram Srinivasan, “Interspeech 2021 deep noise suppression challenge,” in INTERSPEECH, 2021.
12]Chanddan KA Reddy, Harishchandra Dubey, Koishida Kazuhito, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan, “Interspeech 2021 Deep noise suppress Challenge” in InterSPEECH, 2021 訳抜け防止モード: [12]カ・チャンダン・レディ、ハリシャンドラ・デュビー、小石田一仁 Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun Hannes Gamper氏、Robert Aichner氏、Sriram Srinivasan氏は、"2021年のディープノイズ抑制チャレンジ"について次のように述べている。 InterSPEECH, 2021年。
0.73
[13] Jean-Marc Valin, “A hybrid dsp/deep learning approach to real-time full-band speech enhancement,” in 2018 IEEE 20th international workshop on multimedia signal processing (MMSP).
13] jean-marc valin, “a hybrid dsp/deep learning approach to real-time full-band speech enhancement” in 2018 ieee 20th international workshop on multimedia signal processing (mmsp) 訳抜け防止モード: [13 ] Jean - Marc Valin, “リアルタイムフルバンド音声強調のためのハイブリッドdsp/ディープラーニングアプローチ” 2018年、IEEE 20th International Workshop on Multimedia Signal Processing (MMSP) に参加。
0.71
IEEE, 2018.
2018年、IEEE。
0.52
[14] Sebastian Braun, Hannes Gamper, Chandan KA Reddy, and Ivan Tashev, “Towards efficient models for realin IEEE International time deep noise suppression,” Conference on Acoustics, Speech and Signal Processing (ICASSP).
14] Sebastian Braun, Hannes Gamper, Chandan KA Reddy, Ivan Tashev, “Towards efficient model for realin IEEE International time deep noise suppress”, Conference on Acoustics, Speech and Signal Processing (ICASSP)。 訳抜け防止モード: 14]セバスチャン・ブラウン ハンネス・ギャンパー チャンダン・カ・レディー そしてivan tashev氏は、"ieeeの国際的な時間的深部ノイズ抑制のための効率的なモデルを目指して"。 international conference on acoustics, speech and signal processing (icassp) 参加報告
0.65
IEEE, 2021.
IEEE、2021年。
0.81
[15] Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, and Lei Xie, “DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement,” in INTERSPEECH, 2020.
[16] Shubo Lv, Yihui Fu, Mengtao Xing, Jiayao Sun, Lei Xie, Jun Huang, Yannan Wang, and Tao Yu, “SDCCRN: Super wide band dccrn with learnable complex feature for speech enhancement,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) において, [16] Shubo Lv, Yihui Fu, Mengtao Xing, Jiayao Sun, Lei Xie, Jun Huang, Yannan Wang, Tao Yu, “SDCCRN: Super wide band dccrn with learnable complex feature for speech enhancement” と題された。 訳抜け防止モード: 【16 年]周房lv、入封、満太興、 ジャヤオ・サン、レイ・シー、ジュン・フン、ヤンナン・ワン そしてtao yu, “sdccrn : super wide band dccrn with learnable complex feature for speech enhancement”。 ieee international conference on acoustics, speech and signal processing (icassp) にて発表。
0.61
IEEE, 2022.
IEEE、2022年。
0.76
[17] Shubo Lv, Yanxin Hu, Shimin Zhang, and Lei Xie, “DCCRN+: Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement,” in INTERSPEECH, 2021.
[17]Shubo Lv,Yanxin Hu,Shimin Zhang,Lei Xie, “DCCRN+: Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement”, InterSPEECH, 2021。 訳抜け防止モード: [17 ]周房 Lv,Yanxin Hu,Shimin Zhang, そしてLei Xie, “DCCRN+ : Channel - wise subband DCCRN with SNR Estimation for Speech Enhancement”。 InterSPEECH, 2021年。
0.79
[18] Jun Chen, Zilin Wang, Deyi Tuo, Zhiyong Wu, Shiyin Kang, and Helen Meng, “FullSubNet+: Channel attention fullsubnet with complex spectrograms for speech in IEEE International Conference on enhancement,” Acoustics, Speech and Signal Processing (ICASSP).
[18]Jun Chen,Zilin Wang,Deyi Tuo,Zhiyong Wu,Shiyin Kang,Helen Meng, “FullSubNet+: Channel attention fullsubnet with complex spectrograms for speech in IEEE International Conference on enhancement”, Acoustics, Speech and Signal Processing (ICASSP)”。
0.40
IEEE, 2022.
IEEE、2022年。
0.76
[19] ITU, “Wideband extension to Recommendation P.862 for the assessment of wideband telephone networks and speech codecs,” ITU-T Recommendation P.862.2, 2007.
[19] ITU, “Wideband extension to Recommendation P.862 for the Assessment of wideband phone network and speech codecs”. ITU-T Recommendation P.862.2, 2007 訳抜け防止モード: [19 ]ITU,「広帯域電話ネットワークと音声コーデックの評価のための勧告P.862への広帯域拡張」 ITU - T Recommendation P.862.2 , 2007
0.78
[20] Cees H Taal, Richard C Hendriks, Richard Heusdens, and Jesper Jensen, “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” IEEE Transactions on Audio, Speech, and Language Processing, 2011.
Cees H Taal, Richard C Hendriks, Richard Heusdens, Jesper Jensen, “An algorithm for intelligibility prediction of time– frequency weighted noisy speech”, IEEE Transactions on Audio, Speech, and Language Processing, 2011”. 訳抜け防止モード: 20] Cees H Taal, Richard C Hendriks, Richard Heusdens, そしてJesper Jensen氏は,“時間の知性予測のためのアルゴリズム – 周波数重み付けされた雑音のあるスピーチ”だ。 IEEE Transactions on Audio, Speech, and Language Processing, 2011
0.91
[21] Yi Hu and Philipos C Loizou, “Evaluation of objective quality measures for speech enhancement,” IEEE Transactions on audio, speech, and language processing, 2007.
[21]Yi Hu,Philipos C Loizou, “Evaluation of objective quality measures for speech enhancement”, IEEE Transactions on audio, Speech, and Language Processing, 2007 訳抜け防止モード: [21]Yi Hu,Philipos C Loizou,「音声強調のための客観的品質尺度の評価」 IEEE Transactions on audio, speech, and language processing, 2007。
0.75
[22] Chandan KA Reddy, Vishak Gopal, and Ross Cutler, “Dnsmos p. 835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) において,[22] Chandan KA Reddy, Vishak Gopal, Ross Cutler, “Dnsmos p. 835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressor” と題された。 訳抜け防止モード: 【22】チャンダン・カ・レディ、ヴィシャク・ゴパル、ロス・カトラー dnsmos p. 835 : 非侵入的客観的音声品質指標 騒音抑制装置を評価する」 ieee international conference on acoustics, speech and signal processing (icassp) にて発表。