Conformer-based Hybrid ASR System for Switchboard Dataset
- URL: http://arxiv.org/abs/2111.03442v1
- Date: Fri, 5 Nov 2021 12:03:18 GMT
- Title: Conformer-based Hybrid ASR System for Switchboard Dataset
- Authors: Mohammad Zeineldeen, Jingjing Xu, Christoph L\"uscher, Wilfried
Michel, Alexander Gerstenberger, Ralf Schl\"uter, Hermann Ney
- Abstract summary: We present and evaluate a competitive conformer-based hybrid model training recipe.
We study different training aspects and methods to improve word-error-rate as well as to increase training speed.
We conduct experiments on Switchboard 300h dataset and our conformer-based hybrid model achieves competitive results.
- Score: 99.88988282353206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recently proposed conformer architecture has been successfully used for
end-to-end automatic speech recognition (ASR) architectures achieving
state-of-the-art performance on different datasets. To our best knowledge, the
impact of using conformer acoustic model for hybrid ASR is not investigated. In
this paper, we present and evaluate a competitive conformer-based hybrid model
training recipe. We study different training aspects and methods to improve
word-error-rate as well as to increase training speed. We apply time
downsampling methods for efficient training and use transposed convolutions to
upsample the output sequence again. We conduct experiments on Switchboard 300h
dataset and our conformer-based hybrid model achieves competitive results
compared to other architectures. It generalizes very well on Hub5'01 test set
and outperforms the BLSTM-based hybrid model significantly.
Related papers
- Hybrid Training Approaches for LLMs: Leveraging Real and Synthetic Data to Enhance Model Performance in Domain-Specific Applications [0.0]
This research explores a hybrid approach to fine-tuning large language models (LLMs)
By leveraging a dataset combining transcribed real interactions with high-quality synthetic sessions, we aimed to overcome the limitations of domain-specific real data.
The study evaluated three models: a base foundational model, a model fine-tuned with real data, and a hybrid fine-tuned model.
arXiv Detail & Related papers (2024-10-11T18:16:03Z) - Test-time Training for Hyperspectral Image Super-resolution [95.38382633281398]
Hyperspectral image (HSI) super-resolution (SR) is still lagging behind the research of RGB image SR.
In this work, we propose a new test-time training method to tackle this problem.
Specifically, a novel self-training framework is developed, where more accurate pseudo-labels and more accurate LR-HR relationships are generated.
arXiv Detail & Related papers (2024-09-13T09:30:19Z) - Heterogeneous Learning Rate Scheduling for Neural Architecture Search on Long-Tailed Datasets [0.0]
We propose a novel adaptive learning rate scheduling strategy tailored for the architecture parameters of DARTS.
Our approach dynamically adjusts the learning rate of the architecture parameters based on the training epoch, preventing the disruption of well-trained representations.
arXiv Detail & Related papers (2024-06-11T07:32:25Z) - Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries.
Experimental results show that our method improves consistently over existing methods.
Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z) - Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo
Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance.
Current methods with a fixed model do not work uniformly well across various datasets.
This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - A Mixture of Expert Based Deep Neural Network for Improved ASR [4.993304210475779]
MixNet is a novel deep learning architecture for acoustic model in the context of Automatic Speech Recognition (ASR)
In natural speech, overlap in distribution across different acoustic classes is inevitable, which leads to inter-class mis-classification.
Experiments are conducted on a large vocabulary ASR task which show that the proposed architecture provides 13.6% and 10.0% relative reduction in word error rates.
arXiv Detail & Related papers (2021-12-02T07:26:34Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z) - Distributed Training of Deep Neural Network Acoustic Models for
Automatic Speech Recognition [33.032361181388886]
We provide an overview of distributed training techniques for deep neural network acoustic models for ASR.
Experiments are carried out on a popular public benchmark to study the convergence, speedup and recognition performance of the investigated strategies.
arXiv Detail & Related papers (2020-02-24T19:31:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.