Sequence-to-sequence models in peer-to-peer learning: A practical application
- URL: http://arxiv.org/abs/2406.02565v1
- Date: Thu, 2 May 2024 14:44:06 GMT
- Title: Sequence-to-sequence models in peer-to-peer learning: A practical application
- Authors: Robert Šajina, Ivo Ipšić,
- Abstract summary: The paper explores the applicability of sequence-to-sequence (Seq2Seq) models based on LSTM units for Automatic Speech Recognition (ASR) task within peer-to-peer learning environments.
The findings demonstrate the feasibility of employing Seq2Seq models in decentralized settings.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores the applicability of sequence-to-sequence (Seq2Seq) models based on LSTM units for Automatic Speech Recognition (ASR) task within peer-to-peer learning environments. Leveraging two distinct peer-to-peer learning methods, the study simulates the learning process of agents and evaluates their performance in ASR task using two different ASR datasets. In a centralized training setting, utilizing a scaled-down variant of the Deep Speech 2 model, a single model achieved a Word Error Rate (WER) of 84\% when trained on the UserLibri dataset, and 38\% when trained on the LJ Speech dataset. Conversely, in a peer-to-peer learning scenario involving 55 agents, the WER ranged from 87\% to 92\% for the UserLibri dataset, and from 52\% to 56\% for the LJ Speech dataset. The findings demonstrate the feasibility of employing Seq2Seq models in decentralized settings, albeit with slightly higher Word Error Rates (WER) compared to centralized training methods.
Related papers
- STTATTS: Unified Speech-To-Text And Text-To-Speech Model [6.327929516375736]
We propose a parameter-efficient approach to learning ASR and TTS jointly via a multi-task learning objective and shared parameters.
Our evaluation demonstrates that the performance of our multi-task model is comparable to that of individually trained models.
arXiv Detail & Related papers (2024-10-24T10:04:24Z) - Efficient data selection employing Semantic Similarity-based Graph
Structures for model training [1.5845679507219355]
This paper introduces Semantics for data SAliency in Model performance Estimation (SeSaME)
It is an efficient data sampling mechanism solely based on textual information without passing the data through a compute-heavy model.
The application of this approach is demonstrated in the use case of low-resource automated speech recognition (ASR) models.
arXiv Detail & Related papers (2024-02-22T09:43:53Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - Data Curation Alone Can Stabilize In-context Learning [20.874674130060388]
In-context learning (ICL) enables large language models to perform new tasks by prompting them with a sequence of training examples.
randomly sampling examples from a training set leads to high variance in performance.
We show that carefully curating a subset of training data greatly stabilizes ICL performance without any other changes to the ICL algorithm.
arXiv Detail & Related papers (2022-12-20T15:58:54Z) - Segment-level Metric Learning for Few-shot Bioacoustic Event Detection [56.59107110017436]
We propose a segment-level few-shot learning framework that utilizes both the positive and negative events during model optimization.
Our system achieves an F-measure of 62.73 on the DCASE 2022 challenge task 5 (DCASE2022-T5) validation set, outperforming the performance of the baseline prototypical network 34.02 by a large margin.
arXiv Detail & Related papers (2022-07-15T22:41:30Z) - Sequence-level self-learning with multiple hypotheses [53.04725240411895]
We develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR)
In contrast to conventional unsupervised learning approaches, we adopt the emphmulti-task learning (MTL) framework.
Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only.
arXiv Detail & Related papers (2021-12-10T20:47:58Z) - LiST: Lite Self-training Makes Efficient Few-shot Learners [91.28065455714018]
LiST improves by 35% over classic fine-tuning methods and 6% over prompt-tuning with 96% reduction in number of trainable parameters when fine-tuned with no more than 30 labeled examples from each target domain.
arXiv Detail & Related papers (2021-10-12T18:47:18Z) - Environmental sound analysis with mixup based multitask learning and
cross-task fusion [0.12891210250935145]
acoustic scene classification and acoustic event classification are two closely related tasks.
In this letter, a two-stage method is proposed for the above tasks.
The proposed method has confirmed the complementary characteristics of acoustic scene and acoustic event classifications.
arXiv Detail & Related papers (2021-03-30T05:11:53Z) - Generating Human Readable Transcript for Automatic Speech Recognition
with Pre-trained Language Model [18.26945997660616]
Many downstream tasks and human readers rely on the output of the ASR system.
We propose an ASR post-processing model that aims to transform the incorrect and noisy ASR output into a readable text.
arXiv Detail & Related papers (2021-02-22T15:45:50Z) - Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [133.93803565077337]
retrieval-augmented generation models combine pre-trained parametric and non-parametric memory for language generation.
We show that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
arXiv Detail & Related papers (2020-05-22T21:34:34Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.