Resource-Efficient Separation Transformer
- URL: http://arxiv.org/abs/2206.09507v2
- Date: Mon, 15 Jan 2024 17:35:33 GMT
- Title: Resource-Efficient Separation Transformer
- Authors: Luca Della Libera, Cem Subakan, Mirco Ravanelli, Samuele Cornell,
Fr\'ed\'eric Lepoutre, Fran\c{c}ois Grondin
- Abstract summary: This paper explores Transformer-based speech separation with a reduced computational cost.
Our main contribution is the development of the Resource-Efficient Separation Transformer (RE-SepFormer), a self-attention-based architecture.
The RE-SepFormer reaches a competitive performance on the popular WSJ0-2Mix and WHAM! datasets in both causal and non-causal settings.
- Score: 14.666016177212837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have recently achieved state-of-the-art performance in speech
separation. These models, however, are computationally demanding and require a
lot of learnable parameters. This paper explores Transformer-based speech
separation with a reduced computational cost. Our main contribution is the
development of the Resource-Efficient Separation Transformer (RE-SepFormer), a
self-attention-based architecture that reduces the computational burden in two
ways. First, it uses non-overlapping blocks in the latent space. Second, it
operates on compact latent summaries calculated from each chunk. The
RE-SepFormer reaches a competitive performance on the popular WSJ0-2Mix and
WHAM! datasets in both causal and non-causal settings. Remarkably, it scales
significantly better than the previous Transformer-based architectures in terms
of memory and inference time, making it more suitable for processing long
mixtures.
Related papers
- MoEUT: Mixture-of-Experts Universal Transformers [75.96744719516813]
Universal Transformers (UTs) have advantages over standard Transformers in learning compositional generalizations.
Layer-sharing drastically reduces the parameter count compared to the non-shared model with the same dimensionality.
No previous work has succeeded in proposing a shared-layer Transformer design that is competitive in parameter count-dominated tasks such as language modeling.
arXiv Detail & Related papers (2024-05-25T03:24:32Z) - Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST)
CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background.
Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z) - RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.
We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - nnFormer: Interleaved Transformer for Volumetric Segmentation [50.10441845967601]
We introduce nnFormer, a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution.
nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC.
arXiv Detail & Related papers (2021-09-07T17:08:24Z) - TCCT: Tightly-Coupled Convolutional Transformer on Time Series
Forecasting [6.393659160890665]
We propose the concept of tightly-coupled convolutional Transformer(TCCT) and three TCCT architectures.
Our experiments on real-world datasets show that our TCCT architectures could greatly improve the performance of existing state-of-art Transformer models.
arXiv Detail & Related papers (2021-08-29T08:49:31Z) - Decoupled Transformer for Scalable Inference in Open-domain Question
Answering [0.0]
Large transformer models, such as BERT, achieve state-of-the-art results in machine reading comprehension (MRC) for open-domain question answering (QA)
In experiments on the SQUAD 2.0 dataset, a decoupled transformer reduces the computational cost and latency of open-domain MRC by 30-40% with only 1.2 points worse F1-score compared to a standard transformer.
arXiv Detail & Related papers (2021-08-05T17:53:40Z) - Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex.
This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z) - Attention is All You Need in Speech Separation [12.57578429586883]
We propose a novel RNN-free Transformer-based neural network for speech separation.
The proposed model achieves state-of-the-art (SOTA) performance on the standard WSJ0-2/3mix datasets.
arXiv Detail & Related papers (2020-10-25T16:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.