NATURE: Natural Auxiliary Text Utterances for Realistic Spoken Language
Evaluation
- URL: http://arxiv.org/abs/2111.05196v1
- Date: Tue, 9 Nov 2021 15:09:06 GMT
- Title: NATURE: Natural Auxiliary Text Utterances for Realistic Spoken Language
Evaluation
- Authors: David Alfonso-Hermelo, Ahmad Rashid, Abbas Ghaddar, Philippe Langlais,
Mehdi Rezagholizadeh
- Abstract summary: Slot-filling and intent detection are the backbone of conversational agents such as voice assistants, and are active areas of research.
Even though state-of-the-art techniques on publicly available benchmarks show impressive performance, their ability to generalize to realistic scenarios is yet to be demonstrated.
We present, a set of simple spoken-language oriented transformations, applied to the evaluation set of datasets to introduce human spoken language variations while preserving the semantics of an utterance.
- Score: 7.813460653362095
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Slot-filling and intent detection are the backbone of conversational agents
such as voice assistants, and are active areas of research. Even though
state-of-the-art techniques on publicly available benchmarks show impressive
performance, their ability to generalize to realistic scenarios is yet to be
demonstrated. In this work, we present NATURE, a set of simple spoken-language
oriented transformations, applied to the evaluation set of datasets, to
introduce human spoken language variations while preserving the semantics of an
utterance. We apply NATURE to common slot-filling and intent detection
benchmarks and demonstrate that simple perturbations from the standard
evaluation set by NATURE can deteriorate model performance significantly.
Through our experiments we demonstrate that when NATURE operators are applied
to evaluation set of popular benchmarks the model accuracy can drop by up to
40%.
Related papers
- Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Retrieval-based Disentangled Representation Learning with Natural
Language Supervision [61.75109410513864]
We present Vocabulary Disentangled Retrieval (VDR), a retrieval-based framework that harnesses natural language as proxies of the underlying data variation to drive disentangled representation learning.
Our approach employ a bi-encoder model to represent both data and natural language in a vocabulary space, enabling the model to distinguish intrinsic dimensions that capture characteristics within data through its natural language counterpart, thus disentanglement.
arXiv Detail & Related papers (2022-12-15T10:20:42Z) - ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented
Visual Models [102.63817106363597]
We build ELEVATER, the first benchmark to compare and evaluate pre-trained language-augmented visual models.
It consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge.
We will release our toolkit and evaluation platforms for the research community.
arXiv Detail & Related papers (2022-04-19T10:23:42Z) - Uncovering More Shallow Heuristics: Probing the Natural Language
Inference Capacities of Transformer-Based Pre-Trained Language Models Using
Syllogistic Patterns [9.031827448667086]
We explore the shallows used by transformer-based pre-trained language models (PLMs) that are fine-tuned for natural language inference (NLI)
We find evidence that the models rely heavily on certain shallows, picking up on symmetries and asymmetries between premise and hypothesis.
arXiv Detail & Related papers (2022-01-19T14:15:41Z) - Self-Normalized Importance Sampling for Neural Language Modeling [97.96857871187052]
In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.
We show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks.
arXiv Detail & Related papers (2021-11-11T16:57:53Z) - A Systematic Investigation of Commonsense Understanding in Large
Language Models [23.430757316504316]
Large language models have shown impressive performance on many natural language processing (NLP) tasks in a zero-shot setting.
We ask whether these models exhibit commonsense understanding by evaluating models against four commonsense benchmarks.
arXiv Detail & Related papers (2021-10-31T22:20:36Z) - Naturalness Evaluation of Natural Language Generation in Task-oriented
Dialogues using BERT [6.1478669848771546]
This paper presents an automatic method to evaluate the naturalness of natural language generation in dialogue systems.
By fine-tuning the BERT model, our proposed naturalness evaluation method shows robust results and outperforms the baselines.
arXiv Detail & Related papers (2021-09-07T08:40:14Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing.
We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.