End-to-End Automatic Speech Recognition with Deep Mutual Learning
- URL: http://arxiv.org/abs/2102.08154v1
- Date: Tue, 16 Feb 2021 13:52:06 GMT
- Title: End-to-End Automatic Speech Recognition with Deep Mutual Learning
- Authors: Ryo Masumura, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Takanori
Ashihara
- Abstract summary: This paper is the first to apply deep mutual learning to end-to-end ASR models.
In DML, multiple models are trained simultaneously and collaboratively by mimicking each other throughout the training process.
We demonstrate that DML improves the ASR performance of both modeling setups compared with conventional learning methods.
- Score: 29.925641799136663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper is the first study to apply deep mutual learning (DML) to
end-to-end ASR models. In DML, multiple models are trained simultaneously and
collaboratively by mimicking each other throughout the training process, which
helps to attain the global optimum and prevent models from making
over-confident predictions. While previous studies applied DML to simple
multi-class classification problems, there are no studies that have used it on
more complex sequence-to-sequence mapping problems. For this reason, this paper
presents a method to apply DML to state-of-the-art Transformer-based end-to-end
ASR models. In particular, we propose to combine DML with recent representative
training techniques. i.e., label smoothing, scheduled sampling, and
SpecAugment, each of which are essential for powerful end-to-end ASR models. We
expect that these training techniques work well with DML because DML has
complementary characteristics. We experimented with two setups for Japanese ASR
tasks: large-scale modeling and compact modeling. We demonstrate that DML
improves the ASR performance of both modeling setups compared with conventional
learning methods including knowledge distillation. We also show that combining
DML with the existing training techniques effectively improves ASR performance.
Related papers
- Model Composition for Multimodal Large Language Models [73.70317850267149]
We propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model.
Our basic implementation, NaiveMC, demonstrates the effectiveness of this paradigm by reusing modality encoders and merging LLM parameters.
arXiv Detail & Related papers (2024-02-20T06:38:10Z) - Learning Semantic Proxies from Visual Prompts for Parameter-Efficient Fine-Tuning in Deep Metric Learning [13.964106147449051]
Existing solutions concentrate on fine-tuning the pre-trained models on conventional image datasets.
We propose a novel and effective framework based on learning Visual Prompts (VPT) in the pre-trained Vision Transformers (ViT)
We demonstrate that our new approximations with semantic information are superior to representative capabilities.
arXiv Detail & Related papers (2024-02-04T04:42:05Z) - Improving Discriminative Multi-Modal Learning with Large-Scale
Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning.
We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z) - Task Aware Modulation using Representation Learning: An Approach for Few
Shot Learning in Heterogeneous Systems [16.524898421921108]
TAM-RL is a framework that enhances personalized predictions in few-shot settings for heterogeneous systems.
We show that TAM-RL can significantly outperform existing baseline approaches such as MAML and multi-modal MAML.
We show that TAM-RL significantly improves predictive performance for cases where it is possible to learn distinct representations for different tasks.
arXiv Detail & Related papers (2023-10-07T07:55:22Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - ZhichunRoad at Amazon KDD Cup 2022: MultiTask Pre-Training for
E-Commerce Product Search [4.220439000486713]
We propose a robust multilingual model to improve the quality of search results.
In pre-training stage, we adopt mlm task, classification task and contrastive learning task.
In fine-tuning stage, we use confident learning, exponential moving average method (EMA), adversarial training (FGM) and regularized dropout strategy (R-Drop)
arXiv Detail & Related papers (2023-01-31T07:31:34Z) - Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language
Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks.
Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients.
We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z) - On Fast Adversarial Robustness Adaptation in Model-Agnostic
Meta-Learning [100.14809391594109]
Model-agnostic meta-learning (MAML) has emerged as one of the most successful meta-learning techniques in few-shot learning.
Despite the generalization power of the meta-model, it remains elusive that how adversarial robustness can be maintained by MAML in few-shot learning.
We propose a general but easily-optimized robustness-regularized meta-learning framework, which allows the use of unlabeled data augmentation, fast adversarial attack generation, and computationally-light fine-tuning.
arXiv Detail & Related papers (2021-02-20T22:03:04Z) - Revisiting Training Strategies and Generalization Performance in Deep
Metric Learning [28.54755295856929]
We revisit the most widely used DML objective functions and conduct a study of the crucial parameter choices.
Under consistent comparison, DML objectives show much higher saturation than indicated by literature.
Exploiting these insights, we propose a simple, yet effective, training regularization to reliably boost the performance of ranking-based DML models.
arXiv Detail & Related papers (2020-02-19T22:16:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.