Related papers: Robustness Testing of Language Understanding in Dialog Systems

Robustness Testing of Language Understanding in Dialog Systems

URL: http://arxiv.org/abs/2012.15262v1
Date: Wed, 30 Dec 2020 18:18:47 GMT
Title: Robustness Testing of Language Understanding in Dialog Systems
Authors: Jiexi Liu, Ryuichi Takanobu, Jiaxin Wen, Dazhen Wan, Weiran Nie, Hongyan Li, Cheng Li, Wei Peng, Minlie Huang
Abstract summary: We conduct comprehensive evaluation and analysis with respect to the robustness of natural language understanding models. We introduce three important aspects related to language understanding in real-world dialog systems, namely, language variety, speech characteristics, and noise perturbation. We propose a model-agnostic toolkit LAUG to approximate natural perturbation for testing the robustness issues in dialog systems.
Score: 33.30143655553583
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most language understanding models in dialog systems are trained on a small amount of annotated training data, and evaluated in a small set from the same distribution. However, these models can lead to system failure or undesirable outputs when being exposed to natural perturbation in practice. In this paper, we conduct comprehensive evaluation and analysis with respect to the robustness of natural language understanding models, and introduce three important aspects related to language understanding in real-world dialog systems, namely, language variety, speech characteristics, and noise perturbation. We propose a model-agnostic toolkit LAUG to approximate natural perturbation for testing the robustness issues in dialog systems. Four data augmentation approaches covering the three aspects are assembled in LAUG, which reveals critical robustness issues in state-of-the-art models. The augmented dataset through LAUG can be used to facilitate future research on the robustness testing of language understanding in dialog systems.

Related papers

UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations [71.79210031338464]
We show how to unify dense retrieval and response generation for large language models in conversation.<n>We conduct joint fine-tuning with different objectives and design two mechanisms to reduce the inconsistency risks.<n>The evaluations on five conversational search datasets demonstrate that our unified model can mutually improve both tasks and outperform the existing baselines.
arXiv Detail & Related papers (2025-07-09T17:02:40Z)
WHEN TO ACT, WHEN TO WAIT: Modeling Structural Trajectories for Intent Triggerability in Task-Oriented Dialogue [13.925217613823264]
Task-oriented dialogue systems often face difficulties when user utterances seem semantically complete but lack necessary structural information for appropriate system action.<n>We present STORM, a framework modeling asymmetric information dynamics through conversations between UserLLM and AgentLLM.<n>Our contributions include: (1) formalizing asymmetric information processing in dialogue systems; (2) modeling intent formation tracking collaborative understanding evolution; and (3) evaluation metrics measuring internal cognitive improvements alongside task performance.
arXiv Detail & Related papers (2025-06-02T17:11:10Z)
Training Zero-Shot Generalizable End-to-End Task-Oriented Dialog System Without Turn-level Dialog Annotations [2.757798192967912]
This work employs multi-task instruction fine-tuning to create more efficient and scalable task-oriented dialogue systems. Our approach outperforms both state-of-the-art models trained on annotated data and billion-scale parameter off-the-shelf ChatGPT models.
arXiv Detail & Related papers (2024-07-21T04:52:38Z)
A Comparative Analysis of Conversational Large Language Models in Knowledge-Based Text Generation [5.661396828160973]
We conduct an empirical analysis of conversational large language models in generating natural language text from semantic triples. We compare four large language models of varying sizes with different prompting techniques. Our findings show that the capabilities of large language models in triple verbalization can be significantly improved through few-shot prompting, post-processing, and efficient fine-tuning techniques.
arXiv Detail & Related papers (2024-02-02T15:26:39Z)
Improving the Robustness of Knowledge-Grounded Dialogue via Contrastive Learning [71.8876256714229]
We propose an entity-based contrastive learning framework for improving the robustness of knowledge-grounded dialogue systems. Our method achieves new state-of-the-art performance in terms of automatic evaluation scores.
arXiv Detail & Related papers (2024-01-09T05:16:52Z)
Evaluating Robustness of Dialogue Summarization Models in the Presence of Naturally Occurring Variations [13.749495524988774]
We systematically investigate the impact of real-life variations on state-of-the-art dialogue summarization models. We introduce two types of perturbations: utterance-level perturbations that modify individual utterances with errors and language variations, and dialogue-level perturbations that add non-informative exchanges. We find that both fine-tuned and instruction-tuned models are affected by input variations, with the latter being more susceptible.
arXiv Detail & Related papers (2023-11-15T05:11:43Z)
Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z)
CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog Evaluation [75.60156479374416]
CGoDial is a new challenging and comprehensive Chinese benchmark for Goal-oriented Dialog evaluation. It contains 96,763 dialog sessions and 574,949 dialog turns totally, covering three datasets with different knowledge sources. To bridge the gap between academic benchmarks and spoken dialog scenarios, we either collect data from real conversations or add spoken features to existing datasets via crowd-sourcing.
arXiv Detail & Related papers (2022-11-21T16:21:41Z)
Learning Locality and Isotropy in Dialogue Modeling [28.743212772593335]
We propose a simple method for dialogue representation calibration, namely SimDRC, to build isotropic and conversational feature spaces. Experimental results show that our approach significantly outperforms the current state-of-the-art models on three dialogue tasks.
arXiv Detail & Related papers (2022-05-29T06:48:53Z)
Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding. COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z)
Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization [29.887562761942114]
We show how DialoGPT, a pre-trained model for conversational response generation, can be developed as an unsupervised dialogue annotator. We apply DialoGPT to label three types of features on two dialogue summarization datasets, SAMSum and AMI, and employ pre-trained and non pre-trained models as our summarizes.
arXiv Detail & Related papers (2021-05-26T13:50:13Z)
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z)
Data Augmentation for Spoken Language Understanding via Pretrained Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity. We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.