Related papers: Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward

Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward

URL: http://arxiv.org/abs/2411.03866v1
Date: Wed, 06 Nov 2024 12:22:04 GMT
Title: Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Authors: Shashi Kumar, Iuliia Thorbecke, Sergio Burdisso, Esaú Villatoro-Tello, Manjunath K E, Kadri Hacioğlu, Pradeep Rangappa, Petr Motlicek, Aravind Ganapathiraju, Andreas Stolcke,
Abstract summary: Recent research has demonstrated that training a linear connector between speech foundation encoders and large language models (LLMs) enables this architecture to achieve strong ASR capabilities. Despite the impressive results, it remains unclear whether these simple approaches are robust enough across different scenarios and speech conditions. We present novel empirical findings that offer insights on how to effectively utilize the SLAM-ASR architecture across a wide range of settings.
Score: 10.914414815406275
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent research has demonstrated that training a linear connector between speech foundation encoders and large language models (LLMs) enables this architecture to achieve strong ASR capabilities. Despite the impressive results, it remains unclear whether these simple approaches are robust enough across different scenarios and speech conditions, such as domain shifts and different speech perturbations. In this paper, we address these questions by conducting various ablation experiments using a recent and widely adopted approach called SLAM-ASR. We present novel empirical findings that offer insights on how to effectively utilize the SLAM-ASR architecture across a wide range of settings. Our main findings indicate that the SLAM-ASR exhibits poor performance in cross-domain evaluation settings. Additionally, speech perturbations within in-domain data, such as changes in speed or the presence of additive noise, can significantly impact performance. Our findings offer critical insights for fine-tuning and configuring robust LLM-based ASR models, tailored to different data characteristics and computational resources.

Related papers

Evaluating and Improving the Robustness of Speech Command Recognition Models to Noise and Distribution Shifts [0.0]
We investigate how training conditions and input features affect the robustness and generalization abilities of spoken keyword classifiers under OOD conditions.<n>Our results suggest that noise-aware training improves in some configurations.
arXiv Detail & Related papers (2025-07-30T22:14:16Z)
Benchmarking Akan ASR Models Across Domain-Specific Datasets: A Comparative Evaluation of Performance, Scalability, and Adaptability [0.0]
This study benchmarks seven Akan automatic speech recognition (ASR) models built on transformer architectures.<n>It shows distinct error behaviors between the Whisper and Wav2Vec2 architectures.<n>These findings highlight the need for targeted domain adaptation techniques, adaptive routing strategies, and multilingual training frameworks.
arXiv Detail & Related papers (2025-07-03T08:01:26Z)
Towards Generalized Range-View LiDAR Segmentation in Adverse Weather [65.22588361803942]
We identify and analyze the unique challenges that affect the generalization of range-view LiDAR segmentation in severe weather.<n>We propose a modular and lightweight framework that enhances robustness without altering the core architecture of existing models.<n>Our approach significantly improves generalization to adverse weather with minimal inference overhead.
arXiv Detail & Related papers (2025-06-10T16:48:27Z)
Impact of Noise on LLM-Models Performance in Abstraction and Reasoning Corpus (ARC) Tasks with Model Temperature Considerations [4.39614901077936]
Large Language Models (LLMs) have generated growing interest in their structured reasoning capabilities. The Abstraction and Reasoning Corpus benchmark plays a crucial role in evaluating these capabilities by testing how well AI models generalize to novel problems. This work underscores the need for developing more robust and adaptable AI systems capable of handling the ambiguity and variability inherent in real-world scenarios.
arXiv Detail & Related papers (2025-04-22T13:43:58Z)
Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains [92.36624674516553]
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs) We investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education. We utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications.
arXiv Detail & Related papers (2025-03-31T08:22:49Z)
Clear Minds Think Alike: What Makes LLM Fine-tuning Robust? A Study of Token Perplexity [61.48338027901318]
We show that fine-tuning with LLM-generated data improves target task performance and reduces out-of-domain degradation. This is the first mechanistic explanation for the superior OOD robustness conferred by LLM-generated training data.
arXiv Detail & Related papers (2025-01-24T08:18:56Z)
Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting [40.78026627009521]
Reinforcement learning (RL) is a promising approach for aligning large language models (LLMs) knowledge with sequential decision-making tasks. We propose a novel framework to analyze the sensitivity of LLMs to prompt formulations following RL training in a textual environment.
arXiv Detail & Related papers (2024-10-25T18:25:35Z)
SAMBO-RL: Shifts-aware Model-based Offline Reinforcement Learning [9.88109749688605]
Model-based offline reinforcement learning trains policies using pre-collected datasets and learned environment models. This paper offers a comprehensive analysis that disentangles the problem into two fundamental components: model bias and policy shift. We introduce Shifts-aware Model-based Offline Reinforcement Learning (SAMBO-RL), a practical framework that efficiently trains classifiers to approximate SAR for policy optimization.
arXiv Detail & Related papers (2024-08-23T04:25:09Z)
Disentangled Noisy Correspondence Learning [56.06801962154915]
Cross-modal retrieval is crucial in understanding latent correspondences across modalities. DisNCL is a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning.
arXiv Detail & Related papers (2024-08-10T09:49:55Z)
Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models [0.8399688944263842]
Large Language Models (LLMs) have the capability to understand and generate human-like text from input queries. This study extends this concept to the integration of LLMs within Retrieval-Augmented Generation (RAG) pipelines. We evaluate the impact of fine-tuning on the LLMs' capacity for data extraction and contextual understanding.
arXiv Detail & Related papers (2024-06-17T04:35:17Z)
Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training [39.21885486667879]
Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges. We propose a novel RAG approach known as Retrieval-augmented Adaptive Adrial Training (RAAT)
arXiv Detail & Related papers (2024-05-31T16:24:53Z)
Crossmodal ASR Error Correction with Discrete Speech Units [16.58209270191005]
We propose a post-ASR processing approach for ASR Error Correction (AEC) We explore pre-training and fine-tuning strategies and uncover an ASR domain discrepancy phenomenon. We propose the incorporation of discrete speech units to align with and enhance the word embeddings for improving AEC quality.
arXiv Detail & Related papers (2024-05-26T19:58:38Z)
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach [64.42462708687921]
Evaluations have revealed that factors such as scaling, training types, architectures and other factors profoundly impact the performance of LLMs. Our study embarks on a thorough re-examination of these LLMs, targeting the inadequacies in current evaluation methods. This includes the application of ANOVA, Tukey HSD tests, GAMM, and clustering technique.
arXiv Detail & Related papers (2024-03-22T14:47:35Z)
RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems [51.171355532527365]
Retrieval-augmented generation (RAG) can significantly improve the performance of language models (LMs) RAGGED is a framework for analyzing RAG configurations across various document-based question answering tasks.
arXiv Detail & Related papers (2024-03-14T02:26:31Z)
A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations [0.34410212782758043]
Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments. This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations.
arXiv Detail & Related papers (2023-07-06T12:33:34Z)
Robustness and Generalization Performance of Deep Learning Models on Cyber-Physical Systems: A Comparative Study [71.84852429039881]
Investigation focuses on the models' ability to handle a range of perturbations, such as sensor faults and noise. We test the generalization and transfer learning capabilities of these models by exposing them to out-of-distribution (OOD) samples.
arXiv Detail & Related papers (2023-06-13T12:43:59Z)
ASR: Attention-alike Structural Re-parameterization [53.019657810468026]
We propose a simple-yet-effective attention-alike structural re- parameterization (ASR) that allows us to achieve SRP for a given network while enjoying the effectiveness of the attention mechanism. In this paper, we conduct extensive experiments from a statistical perspective and discover an interesting phenomenon Stripe Observation, which reveals that channel attention values quickly approach some constant vectors during training.
arXiv Detail & Related papers (2023-04-13T08:52:34Z)
Enhancing the Generalization for Intent Classification and Out-of-Domain Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU) Recent works have shown that using extra data and labels can improve the OOD detection performance. This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.