Related papers: Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

URL: http://arxiv.org/abs/2501.07329v2
Date: Fri, 17 Jan 2025 17:53:27 GMT
Title: Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
Authors: Jiliang Hu, Zuchao Li, Mengjia Shen, Haojun Ai, Sheng Li, Jun Zhang,
Abstract summary: We propose a joint speech recognition and structure learning framework (JSRSL), which can accurately transcribe speech and extract structured content simultaneously.<n>The results show that our proposed method outperforms the traditional sequence-to-sequence method in both transcription and extraction capabilities.
Score: 25.986288893402225
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spoken language understanding (SLU) is a structure prediction task in the field of speech. Recently, many works on SLU that treat it as a sequence-to-sequence task have achieved great success. However, This method is not suitable for simultaneous speech recognition and understanding. In this paper, we propose a joint speech recognition and structure learning framework (JSRSL), an end-to-end SLU model based on span, which can accurately transcribe speech and extract structured content simultaneously. We conduct experiments on name entity recognition and intent classification using the Chinese dataset AISHELL-NER and the English dataset SLURP. The results show that our proposed method not only outperforms the traditional sequence-to-sequence method in both transcription and extraction capabilities but also achieves state-of-the-art performance on the two datasets.

Related papers

dMel: Speech Tokenization made Simple [19.169460770473908]
We show that discretizing mel-filterbank channels into discrete intensity bins produces a simple representation (dMel) Our results demonstrate the effectiveness of dMel in achieving high performance on both tasks within a unified framework.
arXiv Detail & Related papers (2024-07-22T17:51:53Z)
Prompting Large Language Models with Audio for General-Purpose Speech Summarization [13.415189715216354]
We introduce a framework for speech summarization that leverages the processing and reasoning capabilities of large language models (LLMs) We propose an end-to-end system that combines an instruction-tuned LLM with an audio encoder that converts speech into token representations that the LLM can interpret.
arXiv Detail & Related papers (2024-06-10T02:04:28Z)
Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks [68.79880423713597]
We introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis. Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts.
arXiv Detail & Related papers (2024-01-05T17:58:10Z)
Learning Speech Representation From Contrastive Token-Acoustic Pretraining [57.08426714676043]
We propose "Contrastive Token-Acoustic Pretraining (CTAP)", which uses two encoders to bring phoneme and speech into a joint multimodal space. The proposed CTAP model is trained on 210k speech and phoneme pairs, achieving minimally-supervised TTS, VC, and ASR.
arXiv Detail & Related papers (2023-09-01T12:35:43Z)
Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target [58.59044226658916]
Spoken Language Understanding (SLU) is a task that aims to extract semantic information from spoken utterances. We propose to use discrete units as intermediate guidance to improve textless SLU performance.
arXiv Detail & Related papers (2023-05-29T14:00:24Z)
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community. There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers. Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z)
Meta Auxiliary Learning for Low-resource Spoken Language Understanding [11.002938634213734]
Spoken language understanding (SLU) treats automatic speech recognition (ASR) and natural language understanding (NLU) as a unified task. We exploit an ASR and NLU joint training method based on meta auxiliary learning to improve the performance of low-resource SLU task.
arXiv Detail & Related papers (2022-06-26T03:12:33Z)
Unified Speech-Text Pre-training for Speech Translation and Recognition [113.31415771943162]
We describe a method to jointly pre-train speech and text in an encoder-decoder modeling framework for speech translation and recognition. The proposed method incorporates four self-supervised and supervised subtasks for cross modality learning. It achieves between 1.7 and 2.3 BLEU improvement above the state of the art on the MuST-C speech translation dataset.
arXiv Detail & Related papers (2022-04-11T20:59:51Z)
Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning [4.327558819000435]
We propose a novel joint textual-phonetic pre-training approach for learning spoken language representations. Experimental results on spoken language understanding benchmarks, Fluent Speech Commands and SNIPS, show that the proposed approach significantly outperforms strong baseline models.
arXiv Detail & Related papers (2021-04-21T05:19:13Z)
SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding [61.02342238771685]
Spoken language understanding requires a model to analyze input acoustic signal to understand its linguistic content and make predictions. Various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text. We propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules.
arXiv Detail & Related papers (2020-10-05T19:29:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.