Related papers: Efficient Trainable Front-Ends for Neural Speech Enhancement

Efficient Trainable Front-Ends for Neural Speech Enhancement

URL: http://arxiv.org/abs/2002.09286v1
Date: Thu, 20 Feb 2020 01:51:15 GMT
Title: Efficient Trainable Front-Ends for Neural Speech Enhancement
Authors: Jonah Casebeer, Umut Isik, Shrikant Venkataramani, Arvindh Krishnaswamy
Abstract summary: We present an efficient, trainable front-end based on the butterfly mechanism to compute the Fast Fourier Transform. We show its accuracy and efficiency benefits for low-compute neural speech enhancement models.
Score: 22.313111311130665
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many neural speech enhancement and source separation systems operate in the time-frequency domain. Such models often benefit from making their Short-Time Fourier Transform (STFT) front-ends trainable. In current literature, these are implemented as large Discrete Fourier Transform matrices; which are prohibitively inefficient for low-compute systems. We present an efficient, trainable front-end based on the butterfly mechanism to compute the Fast Fourier Transform, and show its accuracy and efficiency benefits for low-compute neural speech enhancement models. We also explore the effects of making the STFT window trainable.

Related papers

LeRF: Learning Resampling Function for Adaptive and Efficient Image Interpolation [64.34935748707673]
Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors. We propose a novel method of Learning Resampling (termed LeRF) which takes advantage of both the structural priors learned by DNNs and the locally continuous assumption. LeRF assigns spatially varying resampling functions to input image pixels and learns to predict the shapes of these resampling functions with a neural network.
arXiv Detail & Related papers (2024-07-13T16:09:45Z)
Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis [16.253898272659242]
This study focuses on Transformer-based LLMs, specifically applying low-rank parametrization to feedforward networks (FFNs) Experiments on the large RefinedWeb dataset show that low-rank parametrization is both efficient (e.g., 2.6$times$ FFN speed-up with 32% parameters) and effective during training. Motivated by this finding, we develop the wide and structured networks surpassing the current medium-sized and large-sized Transformer in perplexity and throughput performance.
arXiv Detail & Related papers (2024-07-13T10:08:55Z)
Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning [42.862705980039784]
Transformer has shown promise in reinforcement learning to model time-varying features. It still suffers from the issues of low data efficiency and high inference latency. In this paper, we propose to investigate the task from a new perspective of the frequency domain.
arXiv Detail & Related papers (2024-05-30T09:43:59Z)
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization [102.92240148504774]
We study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT)
arXiv Detail & Related papers (2023-11-10T18:59:54Z)
Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time. This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z)
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform [9.606821628015933]
We propose a lightweight end-to-end text-to-speech model using multi-band generation and inverse short-time Fourier transform. Experimental results show that our model synthesized speech as natural as that synthesized by VITS. A smaller version of the model significantly outperformed a lightweight baseline model with respect to both naturalness and inference speed.
arXiv Detail & Related papers (2022-10-28T08:15:05Z)
Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier Layers [0.0]
Transformer-based language models utilize the attention mechanism for substantial performance improvements in almost all natural language processing (NLP) tasks. Recent works focused on eliminating the disadvantages of computational inefficiency and showed that transformer-based models can still reach competitive results without the attention layer. A pioneering study proposed the FNet, which replaces the attention layer with the Fourier Transform (FT) in the transformer encoder architecture.
arXiv Detail & Related papers (2022-09-26T16:23:02Z)
FFC-SE: Fast Fourier Convolution for Speech Enhancement [1.0499611180329804]
Fast Fourier convolution (FFC) is the recently proposed neural operator showing promising performance in several computer vision problems. In this work, we design neural network architectures which adapt FFC for speech enhancement. We found that neural networks based on FFC outperform analogous convolutional models and show better or comparable results with other speech enhancement baselines.
arXiv Detail & Related papers (2022-04-06T18:52:47Z)
Functional Regularization for Reinforcement Learning via Learned Fourier Features [98.90474131452588]
We propose a simple architecture for deep reinforcement learning by embedding inputs into a learned Fourier basis. We show that it improves the sample efficiency of both state-based and image-based RL.
arXiv Detail & Related papers (2021-12-06T18:59:52Z)
Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers [55.90468016961356]
We propose an efficient token mixer that learns to mix in the Fourier domain. AFNO is based on a principled foundation of operator learning. It can handle a sequence size of 65k and outperforms other efficient self-attention mechanisms.
arXiv Detail & Related papers (2021-11-24T05:44:31Z)
Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains [69.62456877209304]
We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron to learn high-frequency functions. Results shed light on advances in computer vision and graphics that achieve state-of-the-art results.
arXiv Detail & Related papers (2020-06-18T17:59:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.