NLU++: A Multi-Label, Slot-Rich, Generalisable Dataset for Natural
Language Understanding in Task-Oriented Dialogue
- URL: http://arxiv.org/abs/2204.13021v2
- Date: Thu, 28 Apr 2022 08:33:13 GMT
- Title: NLU++: A Multi-Label, Slot-Rich, Generalisable Dataset for Natural
Language Understanding in Task-Oriented Dialogue
- Authors: I\~nigo Casanueva, Ivan Vuli\'c, Georgios Spithourakis, Pawe{\l}
Budzianowski
- Abstract summary: NLU++ is a novel dataset for natural language understanding (NLU) in task-oriented dialogue (ToD) systems.
NLU++ is divided into two domains (BANKING and HOTELS) and brings several crucial improvements over current commonly used NLU datasets.
- Score: 53.54788957697192
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present NLU++, a novel dataset for natural language understanding (NLU) in
task-oriented dialogue (ToD) systems, with the aim to provide a much more
challenging evaluation environment for dialogue NLU models, up to date with the
current application and industry requirements. NLU++ is divided into two
domains (BANKING and HOTELS) and brings several crucial improvements over
current commonly used NLU datasets. 1) NLU++ provides fine-grained domain
ontologies with a large set of challenging multi-intent sentences, introducing
and validating the idea of intent modules that can be combined into complex
intents that convey complex user goals, combined with finer-grained and thus
more challenging slot sets. 2) The ontology is divided into domain-specific and
generic (i.e., domain-universal) intent modules that overlap across domains,
promoting cross-domain reusability of annotated examples. 3) The dataset design
has been inspired by the problems observed in industrial ToD systems, and 4) it
has been collected, filtered and carefully annotated by dialogue NLU experts,
yielding high-quality annotated data. Finally, we benchmark a series of current
state-of-the-art NLU models on NLU++; the results demonstrate the challenging
nature of the dataset, especially in low-data regimes, the validity of `intent
modularisation', and call for further research on ToD NLU.
Related papers
- Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning [50.1035273069458]
Spoken language understanding (SLU) is a core task in task-oriented dialogue systems.
We propose a multi-level MMCL framework to apply contrastive learning at three levels, including utterance level, slot level, and word level.
Our framework achieves new state-of-the-art results on two public multi-intent SLU datasets.
arXiv Detail & Related papers (2024-05-31T14:34:23Z) - SQATIN: Supervised Instruction Tuning Meets Question Answering for Improved Dialogue NLU [21.805799634495486]
SQATIN is a new framework for dialog NLU based on (i) instruction tuning and (ii) question-answering-based formulation of ID and VE tasks.
SQATIN sets the new state of the art in dialogue NLU, substantially surpassing the performance of current models.
arXiv Detail & Related papers (2023-11-16T01:57:00Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for
Natural Language Understanding in Task-Oriented Dialogue [115.32009638844059]
We extend the English only NLU++ dataset to include manual translations into a range of high, medium, and low resource languages.
Because of its multi-intent property, MULTI3NLU++ represents complex and natural user goals.
We use MULTI3NLU++ to benchmark state-of-the-art multilingual models for the Natural Language Understanding tasks of intent detection and slot labelling.
arXiv Detail & Related papers (2022-12-20T17:34:25Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - AutoNLU: An On-demand Cloud-based Natural Language Understanding System
for Enterprises [21.25334903155791]
We build a practical NLU model for handling various image-editing requests in Photoshop.
We build powerful keyphrase extraction models that achieve state-of-the-art results on two public benchmarks.
In both cases, end users only need to write a small amount of code to convert their datasets into a common format used by AutoNLU.
arXiv Detail & Related papers (2020-11-26T20:51:57Z) - Schema-Guided Natural Language Generation [13.11874946084068]
We present the novel task ofGuided Natural Language Generation (SG-NLG)
In SG-NLG, the goal is still to generate a natural language prompt, but in SG-NLG, the input MRs are paired with rich schemata providing contextual information.
We train different state-of-the-art models for neural natural language generation on this dataset and show that in many cases, including rich schema information allows our models to produce higher quality outputs.
arXiv Detail & Related papers (2020-05-11T23:01:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.