Abstract: Deep complex U-Net structure and convolutional recurrent network (CRN)
structure achieve state-of-the-art performance for monaural speech enhancement.
Both deep complex U-Net and CRN are encoder and decoder structures with skip
connections, which heavily rely on the representation power of the
complex-valued convolutional layers. In this paper, we propose a complex
convolutional block attention module (CCBAM) to boost the representation power
of the complex-valued convolutional layers by constructing more informative
features. The CCBAM is a lightweight and general module which can be easily
integrated into any complex-valued convolutional layers. We integrate CCBAM
with the deep complex U-Net and CRN to enhance their performance for speech
enhancement. We further propose a mixed loss function to jointly optimize the
complex models in both time-frequency (TF) domain and time domain. By
integrating CCBAM and the mixed loss, we form a new end-to-end (E2E) complex
speech enhancement framework. Ablation experiments and objective evaluations
show the superior performance of the proposed approaches.