Related papers: Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models

Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models

URL: http://arxiv.org/abs/2206.14589v1
Date: Wed, 29 Jun 2022 12:49:53 GMT
Title: Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models
Authors: Daniel Bermuth, Alexander Poeppel, Wolfgang Reif
Abstract summary: In Spoken Language Understanding (SLU) the task is to extract important information from audio commands. This paper presents a simple method for embedding intents and entities into Finite State Transducers.
Score: 69.35569554213679
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In Spoken Language Understanding (SLU) the task is to extract important information from audio commands, like the intent of what a user wants the system to do and special entities like locations or numbers. This paper presents a simple method for embedding intents and entities into Finite State Transducers, and, in combination with a pretrained general-purpose Speech-to-Text model, allows building SLU-models without any additional training. Building those models is very fast and only takes a few seconds. It is also completely language independent. With a comparison on different benchmarks it is shown that this method can outperform multiple other, more resource demanding SLU approaches.

Related papers

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data [84.01401439030265]
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs) We present a simple yet effective automatic process for creating speech-text pair data. Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data.
arXiv Detail & Related papers (2024-09-30T07:01:21Z)
Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target [58.59044226658916]
Spoken Language Understanding (SLU) is a task that aims to extract semantic information from spoken utterances. We propose to use discrete units as intermediate guidance to improve textless SLU performance.
arXiv Detail & Related papers (2023-05-29T14:00:24Z)
The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation [13.352795145385645]
Speech translation (ST) is a good means of pretraining speech models for end-to-end spoken language understanding. We show that our models reach higher performance over baselines on monolingual and multilingual intent classification. We also create new benchmark datasets for speech summarization and low-resource/zero-shot transfer from English to French or Spanish.
arXiv Detail & Related papers (2023-05-16T17:53:03Z)
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community. There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers. Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z)
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR [70.61449720963235]
This work proposes a simple yet efficient unsupervised paradigm that connects speech and textual pre-trained models. We show that unsupervised automatic speech recognition (ASR) can improve the representations from speech self-supervised models. Notably, on spoken question answering, we reach the state-of-the-art result over the challenging NMSQA benchmark.
arXiv Detail & Related papers (2022-11-06T04:50:37Z)
WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment [2.7505260301752763]
We propose a novel end-to-end model combining the speech model and the language model for SLU tasks. WaBERT is based on the pre-trained speech and language model, hence training from scratch is not needed.
arXiv Detail & Related papers (2022-04-22T02:14:40Z)
SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding [61.02342238771685]
Spoken language understanding requires a model to analyze input acoustic signal to understand its linguistic content and make predictions. Various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text. We propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules.
arXiv Detail & Related papers (2020-10-05T19:29:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.