Related papers: SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge

SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge

URL: http://arxiv.org/abs/2309.01437v1
Date: Mon, 4 Sep 2023 08:35:05 GMT
Title: SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Authors: Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen Meng
Abstract summary: We introduce sememe-based semantic knowledge information to speech recognition. Our experiments show that sememe information can improve the effectiveness of speech recognition. In addition, our further experiments show that sememe knowledge can improve the model's recognition of long-tailed data.
Score: 58.979490858061745
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, excellent progress has been made in speech recognition. However, pure data-driven approaches have struggled to solve the problem in domain-mismatch and long-tailed data. Considering that knowledge-driven approaches can help data-driven approaches alleviate their flaws, we introduce sememe-based semantic knowledge information to speech recognition (SememeASR). Sememe, according to the linguistic definition, is the minimum semantic unit in a language and is able to represent the implicit semantic information behind each word very well. Our experiments show that the introduction of sememe information can improve the effectiveness of speech recognition. In addition, our further experiments show that sememe knowledge can improve the model's recognition of long-tailed data and enhance the model's domain generalization ability.

Related papers

Improving Pretrained YAMNet for Enhanced Speech Command Detection via Transfer Learning [0.23408308015481666]
We adapt and train a YAMNet deep learning model to effectively detect and interpret speech commands from audio signals. The final model achieved a recognition accuracy of 95.28%, underscoring the impact of advanced machine learning techniques.
arXiv Detail & Related papers (2025-04-26T21:57:11Z)
Retrieval-Augmented Speech Recognition Approach for Domain Challenges [24.337617843696286]
Speech recognition systems often face challenges due to domain mismatch. Inspired by Retrieval-Augmented Generation (RAG) techniques for large language models (LLMs), this paper introduces a LLM-based retrieval-augmented speech recognition method.
arXiv Detail & Related papers (2025-02-21T07:47:50Z)
Contrastive Augmentation: An Unsupervised Learning Approach for Keyword Spotting in Speech Technology [4.080686348274667]
We introduce a novel approach combining unsupervised contrastive learning and a augmentation unique-based technique. Our method allows the neural network to train on unlabeled data sets, potentially improving performance in downstream tasks. We present a speech augmentation-based unsupervised learning method that utilizes the similarity between the bottleneck layer feature and the audio reconstructing information.
arXiv Detail & Related papers (2024-08-31T05:40:37Z)
Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning [70.64617500380287]
Continual learning allows models to learn from new data while retaining previously learned knowledge. The semantic knowledge available in the label information of the images, offers important semantic information that can be related with previously acquired knowledge of semantic classes. We propose integrating semantic guidance within and across tasks by capturing semantic similarity using text embeddings.
arXiv Detail & Related papers (2024-08-02T07:51:44Z)
Continuously Learning New Words in Automatic Speech Recognition [56.972851337263755]
We propose a self-supervised continual learning approach for Automatic Speech Recognition. We use a memory-enhanced ASR model from the literature to decode new words from the slides. We show that with this approach, we obtain increasing performance on the new words when they occur more frequently.
arXiv Detail & Related papers (2024-01-09T10:39:17Z)
Enhancing Context Through Contrast [0.4068270792140993]
We propose a novel Context Enhancement step to improve performance on neural machine translation. Unlike other approaches, we do not explicitly augment the data but view languages as implicit augmentations. Our method does not learn embeddings from scratch and can be generalised to any set of pre-trained embeddings.
arXiv Detail & Related papers (2024-01-06T22:13:51Z)
Disentangling Learnable and Memorizable Data via Contrastive Learning for Semantic Communications [81.10703519117465]
A novel machine reasoning framework is proposed to disentangle source data so as to make it semantic-ready. In particular, a novel contrastive learning framework is proposed, whereby instance and cluster discrimination are performed on the data. Deep semantic clusters of highest confidence are considered learnable, semantic-rich data. Our simulation results showcase the superiority of our contrastive learning approach in terms of semantic impact and minimalism.
arXiv Detail & Related papers (2022-12-18T12:00:12Z)
Joint Language Semantic and Structure Embedding for Knowledge Graph Completion [66.15933600765835]
We propose to jointly embed the semantics in the natural language description of the knowledge triplets with their structure information. Our method embeds knowledge graphs for the completion task via fine-tuning pre-trained language models. Our experiments on a variety of knowledge graph benchmarks have demonstrated the state-of-the-art performance of our method.
arXiv Detail & Related papers (2022-09-19T02:41:02Z)
Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods. This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z)
On the Use of External Data for Spoken Named Entity Recognition [40.93448412171246]
Recent advances in self-supervised speech representations have made it feasible to consider learning models with limited labeled data. We draw on a variety of approaches, including self-training, knowledge distillation, and transfer learning, and consider their applicability to both end-to-end models and pipeline approaches.
arXiv Detail & Related papers (2021-12-14T18:49:26Z)
Semantic TrueLearn: Using Semantic Knowledge Graphs in Recommendation Systems [22.387120578306277]
This work aims to advance towards building a state-aware educational recommendation system that incorporates semantic relatedness. We introduce a novel learner model that exploits this semantic relatedness between knowledge components in learning resources using the Wikipedia link graph. Our experiments with a large dataset demonstrate that this new semantic version of TrueLearn algorithm achieves statistically significant improvements in terms of predictive performance.
arXiv Detail & Related papers (2021-12-08T16:23:27Z)
Named Entity Recognition for Social Media Texts with Semantic Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts. We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z)
On the Effects of Knowledge-Augmented Data in Word Embeddings [0.6749750044497732]
We propose a novel approach for linguistic knowledge injection through data augmentation to learn word embeddings. We show our knowledge augmentation approach improves the intrinsic characteristics of the learned embeddings while not significantly altering their results on a downstream text classification task.
arXiv Detail & Related papers (2020-10-05T02:14:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.