End-to-End Multi-Channel Transformer for Speech Recognition
- URL: http://arxiv.org/abs/2102.03951v1
- Date: Mon, 8 Feb 2021 00:12:44 GMT
- Title: End-to-End Multi-Channel Transformer for Speech Recognition
- Authors: Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Brian King, and
Siegfried Kunzmann
- Abstract summary: We leverage the neural transformer architectures for multi-channel speech recognition systems.
Our network consists of three parts: channel-wise self attention layers (CSA), cross-channel attention layers (CCA), and multi-channel encoder-decoder attention layers (EDA)
- Score: 9.949801888214527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers are powerful neural architectures that allow integrating
different modalities using attention mechanisms. In this paper, we leverage the
neural transformer architectures for multi-channel speech recognition systems,
where the spectral and spatial information collected from different microphones
are integrated using attention layers. Our multi-channel transformer network
mainly consists of three parts: channel-wise self attention layers (CSA),
cross-channel attention layers (CCA), and multi-channel encoder-decoder
attention layers (EDA). The CSA and CCA layers encode the contextual
relationship within and between channels and across time, respectively. The
channel-attended outputs from CSA and CCA are then fed into the EDA layers to
help decode the next token given the preceding ones. The experiments show that
in a far-field in-house dataset, our method outperforms the baseline
single-channel transformer, as well as the super-directive and neural
beamformers cascaded with the transformers.
Related papers
- Hierarchical Transformer for Electrocardiogram Diagnosis [1.4124476944967472]
Transformers, originally prominent in NLP and computer vision, are now being adapted for ECG signal analysis.
This paper introduces a novel hierarchical transformer architecture that segments the model into multiple stages.
A classification token aggregates information across feature scales, facilitating interactions between different stages of the transformer.
arXiv Detail & Related papers (2024-11-01T17:28:03Z) - A Hybrid Transformer-Mamba Network for Single Image Deraining [70.64069487982916]
Existing deraining Transformers employ self-attention mechanisms with fixed-range windows or along channel dimensions.
We introduce a novel dual-branch hybrid Transformer-Mamba network, denoted as TransMamba, aimed at effectively capturing long-range rain-related dependencies.
arXiv Detail & Related papers (2024-08-31T10:03:19Z) - Joint Channel Estimation and Feedback with Masked Token Transformers in
Massive MIMO Systems [74.52117784544758]
This paper proposes an encoder-decoder based network that unveils the intrinsic frequency-domain correlation within the CSI matrix.
The entire encoder-decoder network is utilized for channel compression.
Our method outperforms state-of-the-art channel estimation and feedback techniques in joint tasks.
arXiv Detail & Related papers (2023-06-08T06:15:17Z) - Deep Transformers without Shortcuts: Modifying Self-attention for
Faithful Signal Propagation [105.22961467028234]
Skip connections and normalisation layers are ubiquitous for the training of Deep Neural Networks (DNNs)
Recent approaches such as Deep Kernel Shaping have made progress towards reducing our reliance on them.
But these approaches are incompatible with the self-attention layers present in transformers.
arXiv Detail & Related papers (2023-02-20T21:26:25Z) - Adaptive Channel Encoding Transformer for Point Cloud Analysis [6.90125287791398]
A channel convolution called Transformer-Conv is designed to encode the channel.
It can encode feature channels by capturing the potential relationship between coordinates and features.
Our method is superior to state-of-the-art point cloud classification and segmentation methods on three benchmark datasets.
arXiv Detail & Related papers (2021-12-05T08:18:00Z) - Multi-Channel End-to-End Neural Diarization with Distributed Microphones [53.99406868339701]
We replace Transformer encoders in EEND with two types of encoders that process a multi-channel input.
We also propose a model adaptation method using only single-channel recordings.
arXiv Detail & Related papers (2021-10-10T03:24:03Z) - Learning Signal Representations for EEG Cross-Subject Channel Selection
and Trial Classification [0.3553493344868413]
We introduce an algorithm for subject-independent channel selection of EEG recordings.
It exploits channel-specific 1D-Convolutional Neural Networks (1D-CNNs) as feature extractors in a supervised fashion to maximize class separability.
After training, the algorithm can be exploited by transferring only the parametrized subgroup of selected channel-specific 1D-CNNs to new signals from new subjects.
arXiv Detail & Related papers (2021-06-20T06:22:16Z) - UNETR: Transformers for 3D Medical Image Segmentation [8.59571749685388]
We introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a pure transformer as the encoder to learn sequence representations of the input volume.
We have extensively validated the performance of our proposed model across different imaging modalities.
arXiv Detail & Related papers (2021-03-18T20:17:15Z) - Learning from Heterogeneous EEG Signals with Differentiable Channel
Reordering [51.633889765162685]
CHARM is a method for training a single neural network across inconsistent input channels.
We perform experiments on four EEG classification datasets and demonstrate the efficacy of CHARM.
arXiv Detail & Related papers (2020-10-21T12:32:34Z) - Volumetric Transformer Networks [88.85542905676712]
We introduce a learnable module, the volumetric transformer network (VTN)
VTN predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely.
Our experiments show that VTN consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval.
arXiv Detail & Related papers (2020-07-18T14:00:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.