Self-Attention Generative Adversarial Network for Speech Enhancement
- URL: http://arxiv.org/abs/2010.09132v3
- Date: Sat, 6 Feb 2021 19:51:48 GMT
- Title: Self-Attention Generative Adversarial Network for Speech Enhancement
- Authors: Huy Phan, Huy Le Nguyen, Oliver Y. Ch\'en, Philipp Koch, Ngoc Q. K.
Duong, Ian McLoughlin, Alfred Mertins
- Abstract summary: Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolution operation.
We propose a self-attention layer adapted from non-local attention, coupled with the convolutional and deconvolutional layers of a speech enhancement GAN.
Experiments show that introducing self-attention to SEGAN leads to consistent improvement across the objective evaluation metrics of enhancement performance.
- Score: 37.14341228976058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing generative adversarial networks (GANs) for speech enhancement solely
rely on the convolution operation, which may obscure temporal dependencies
across the sequence input. To remedy this issue, we propose a self-attention
layer adapted from non-local attention, coupled with the convolutional and
deconvolutional layers of a speech enhancement GAN (SEGAN) using raw signal
input. Further, we empirically study the effect of placing the self-attention
layer at the (de)convolutional layers with varying layer indices as well as at
all of them when memory allows. Our experiments show that introducing
self-attention to SEGAN leads to consistent improvement across the objective
evaluation metrics of enhancement performance. Furthermore, applying at
different (de)convolutional layers does not significantly alter performance,
suggesting that it can be conveniently applied at the highest-level
(de)convolutional layer with the smallest memory overhead.
Related papers
- Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via
Instruction Tuning with LITE [62.13435256279566]
Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks.
However, their large size makes their inference slow and computationally expensive.
We show that it enables these layers to acquire 'good' generation ability without affecting the generation ability of the final layer.
arXiv Detail & Related papers (2023-10-28T04:07:58Z) - Deep Augmentation: Self-Supervised Learning with Transformations in Activation Space [19.495587566796278]
We introduce Deep Augmentation, an approach to implicit data augmentation using dropout or PCA to transform a targeted layer within a neural network to improve performance and generalization.
We demonstrate Deep Augmentation through extensive experiments on contrastive learning tasks in NLP, computer vision, and graph learning.
arXiv Detail & Related papers (2023-03-25T19:03:57Z) - Kernel function impact on convolutional neural networks [10.98068123467568]
We study the usage of kernel functions at the different layers in a convolutional neural network.
We show how one can effectively leverage kernel functions, by introducing a more distortion aware pooling layers.
We propose Kernelized Dense Layers (KDL), which replace fully-connected layers.
arXiv Detail & Related papers (2023-02-20T19:57:01Z) - Skip-Attention: Improving Vision Transformers by Paying Less Attention [55.47058516775423]
Vision computation transformers (ViTs) use expensive self-attention operations in every layer.
We propose SkipAt, a method to reuse self-attention from preceding layers to approximate attention at one or more subsequent layers.
We show the effectiveness of our method in image classification and self-supervised learning on ImageNet-1K, semantic segmentation on ADE20K, image denoising on SIDD, and video denoising on DAVIS.
arXiv Detail & Related papers (2023-01-05T18:59:52Z) - Exploiting Explainable Metrics for Augmented SGD [43.00691899858408]
There are several unanswered questions about how learning under optimization really works and why certain strategies are better than others.
We propose new explainability metrics that measure the redundant information in a network's layers.
We then exploit these metrics to augment the Gradient Descent (SGD) by adaptively adjusting the learning rate in each layer to improve generalization performance.
arXiv Detail & Related papers (2022-03-31T00:16:44Z) - Augmenting Convolutional networks with attention-based aggregation [55.97184767391253]
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning.
We plug this learned aggregation layer with a simplistic patch-based convolutional network parametrized by 2 parameters (width and depth)
It yields surprisingly competitive trade-offs between accuracy and complexity, in particular in terms of memory consumption.
arXiv Detail & Related papers (2021-12-27T14:05:41Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Rethinking Skip Connection with Layer Normalization in Transformers and
ResNets [49.87919454950763]
Skip connection is a widely-used technique to improve the performance of deep neural networks.
In this work, we investigate how the scale factors in the effectiveness of the skip connection.
arXiv Detail & Related papers (2021-05-15T11:44:49Z) - Joint Self-Attention and Scale-Aggregation for Self-Calibrated Deraining
Network [13.628218953897946]
In this paper, we propose an effective algorithm, called JDNet, to solve the single image deraining problem.
By designing the Scale-Aggregation and Self-Attention modules with Self-Calibrated convolution skillfully, the proposed model has better deraining results.
arXiv Detail & Related papers (2020-08-06T17:04:34Z) - When Can Self-Attention Be Replaced by Feed Forward Layers? [40.991809705930955]
We show that replacing the upper self-attention layers in the encoder with feed forward layers leads to no performance drop, and even minor gains.
Our experiments offer insights to how self-attention layers process the speech signal.
arXiv Detail & Related papers (2020-05-28T10:35:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.