Text is no more Enough! A Benchmark for Profile-based Spoken Language
Understanding
- URL: http://arxiv.org/abs/2112.11953v1
- Date: Wed, 22 Dec 2021 15:22:17 GMT
- Title: Text is no more Enough! A Benchmark for Profile-based Spoken Language
Understanding
- Authors: Xiao Xu, Libo Qin, Kaiji Chen, Guoxing Wu, Linlin Li, Wanxiang Che
- Abstract summary: Profile-based Spoken Language Understanding (ProSLU) requires the model that not only relies on the plain text but also the supporting profile information to predict the correct intents and slots.
We introduce a large-scale human-annotated Chinese dataset with over 5K utterances and their corresponding supporting profile information.
Experimental results reveal that all existing text-based SLU models fail to work when the utterances are semantically ambiguous.
- Score: 26.549776399115203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current researches on spoken language understanding (SLU) heavily are limited
to a simple setting: the plain text-based SLU that takes the user utterance as
input and generates its corresponding semantic frames (e.g., intent and slots).
Unfortunately, such a simple setting may fail to work in complex real-world
scenarios when an utterance is semantically ambiguous, which cannot be achieved
by the text-based SLU models. In this paper, we first introduce a new and
important task, Profile-based Spoken Language Understanding (ProSLU), which
requires the model that not only relies on the plain text but also the
supporting profile information to predict the correct intents and slots. To
this end, we further introduce a large-scale human-annotated Chinese dataset
with over 5K utterances and their corresponding supporting profile information
(Knowledge Graph (KG), User Profile (UP), Context Awareness (CA)). In addition,
we evaluate several state-of-the-art baseline models and explore a multi-level
knowledge adapter to effectively incorporate profile information. Experimental
results reveal that all existing text-based SLU models fail to work when the
utterances are semantically ambiguous and our proposed framework can
effectively fuse the supporting information for sentence-level intent detection
and token-level slot filling. Finally, we summarize key challenges and provide
new points for future directions, which hopes to facilitate the research.
Related papers
- TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Improving Textless Spoken Language Understanding with Discrete Units as
Intermediate Target [58.59044226658916]
Spoken Language Understanding (SLU) is a task that aims to extract semantic information from spoken utterances.
We propose to use discrete units as intermediate guidance to improve textless SLU performance.
arXiv Detail & Related papers (2023-05-29T14:00:24Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Revisiting the Roles of "Text" in Text Games [102.22750109468652]
This paper investigates the roles of text in the face of different reinforcement learning challenges.
We propose a simple scheme to extract relevant contextual information into an approximate state hash.
Such a lightweight plug-in achieves competitive performance with state-of-the-art text agents.
arXiv Detail & Related papers (2022-10-15T21:52:39Z) - Finstreder: Simple and fast Spoken Language Understanding with Finite
State Transducers using modern Speech-to-Text models [69.35569554213679]
In Spoken Language Understanding (SLU) the task is to extract important information from audio commands.
This paper presents a simple method for embedding intents and entities into Finite State Transducers.
arXiv Detail & Related papers (2022-06-29T12:49:53Z) - STOP: A dataset for Spoken Task Oriented Semantic Parsing [66.14615249745448]
End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model.
We release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available.
In addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems.
arXiv Detail & Related papers (2022-06-29T00:36:34Z) - DSSL: Deep Surroundings-person Separation Learning for Text-based Person
Retrieval [40.70100506088116]
We propose a novel Deep Surroundings-person Separation Learning (DSSL) model in this paper.
A surroundings-person separation and fusion mechanism plays the key role to realize an accurate and effective surroundings-person separation.
Extensive experiments are carried out to evaluate the proposed DSSL on CUHK-PEDES.
arXiv Detail & Related papers (2021-09-12T15:09:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.