Efficient Trainable Front-Ends for Neural Speech Enhancement
- URL: http://arxiv.org/abs/2002.09286v1
- Date: Thu, 20 Feb 2020 01:51:15 GMT
- Title: Efficient Trainable Front-Ends for Neural Speech Enhancement
- Authors: Jonah Casebeer, Umut Isik, Shrikant Venkataramani, Arvindh
Krishnaswamy
- Abstract summary: We present an efficient, trainable front-end based on the butterfly mechanism to compute the Fast Fourier Transform.
We show its accuracy and efficiency benefits for low-compute neural speech enhancement models.
- Score: 22.313111311130665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many neural speech enhancement and source separation systems operate in the
time-frequency domain. Such models often benefit from making their Short-Time
Fourier Transform (STFT) front-ends trainable. In current literature, these are
implemented as large Discrete Fourier Transform matrices; which are
prohibitively inefficient for low-compute systems. We present an efficient,
trainable front-end based on the butterfly mechanism to compute the Fast
Fourier Transform, and show its accuracy and efficiency benefits for
low-compute neural speech enhancement models. We also explore the effects of
making the STFT window trainable.
Related papers
- LeRF: Learning Resampling Function for Adaptive and Efficient Image Interpolation [64.34935748707673]
Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors.
We propose a novel method of Learning Resampling (termed LeRF) which takes advantage of both the structural priors learned by DNNs and the locally continuous assumption.
LeRF assigns spatially varying resampling functions to input image pixels and learns to predict the shapes of these resampling functions with a neural network.
arXiv Detail & Related papers (2024-07-13T16:09:45Z) - Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis [16.253898272659242]
This study focuses on Transformer-based LLMs, specifically applying low-rank parametrization to feedforward networks (FFNs)
Experiments on the large RefinedWeb dataset show that low-rank parametrization is both efficient (e.g., 2.6$times$ FFN speed-up with 32% parameters) and effective during training.
Motivated by this finding, we develop the wide and structured networks surpassing the current medium-sized and large-sized Transformer in perplexity and throughput performance.
arXiv Detail & Related papers (2024-07-13T10:08:55Z) - Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning [42.862705980039784]
Transformer has shown promise in reinforcement learning to model time-varying features.
It still suffers from the issues of low data efficiency and high inference latency.
In this paper, we propose to investigate the task from a new perspective of the frequency domain.
arXiv Detail & Related papers (2024-05-30T09:43:59Z) - Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization [102.92240148504774]
We study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation.
Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters.
We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT)
arXiv Detail & Related papers (2023-11-10T18:59:54Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band
Generation and Inverse Short-Time Fourier Transform [9.606821628015933]
We propose a lightweight end-to-end text-to-speech model using multi-band generation and inverse short-time Fourier transform.
Experimental results show that our model synthesized speech as natural as that synthesized by VITS.
A smaller version of the model significantly outperformed a lightweight baseline model with respect to both naturalness and inference speed.
arXiv Detail & Related papers (2022-10-28T08:15:05Z) - Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier
Layers [0.0]
Transformer-based language models utilize the attention mechanism for substantial performance improvements in almost all natural language processing (NLP) tasks.
Recent works focused on eliminating the disadvantages of computational inefficiency and showed that transformer-based models can still reach competitive results without the attention layer.
A pioneering study proposed the FNet, which replaces the attention layer with the Fourier Transform (FT) in the transformer encoder architecture.
arXiv Detail & Related papers (2022-09-26T16:23:02Z) - FFC-SE: Fast Fourier Convolution for Speech Enhancement [1.0499611180329804]
Fast Fourier convolution (FFC) is the recently proposed neural operator showing promising performance in several computer vision problems.
In this work, we design neural network architectures which adapt FFC for speech enhancement.
We found that neural networks based on FFC outperform analogous convolutional models and show better or comparable results with other speech enhancement baselines.
arXiv Detail & Related papers (2022-04-06T18:52:47Z) - Functional Regularization for Reinforcement Learning via Learned Fourier
Features [98.90474131452588]
We propose a simple architecture for deep reinforcement learning by embedding inputs into a learned Fourier basis.
We show that it improves the sample efficiency of both state-based and image-based RL.
arXiv Detail & Related papers (2021-12-06T18:59:52Z) - Adaptive Fourier Neural Operators: Efficient Token Mixers for
Transformers [55.90468016961356]
We propose an efficient token mixer that learns to mix in the Fourier domain.
AFNO is based on a principled foundation of operator learning.
It can handle a sequence size of 65k and outperforms other efficient self-attention mechanisms.
arXiv Detail & Related papers (2021-11-24T05:44:31Z) - Fourier Features Let Networks Learn High Frequency Functions in Low
Dimensional Domains [69.62456877209304]
We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron to learn high-frequency functions.
Results shed light on advances in computer vision and graphics that achieve state-of-the-art results.
arXiv Detail & Related papers (2020-06-18T17:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.