Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken
Language Understanding
- URL: http://arxiv.org/abs/2305.13512v2
- Date: Thu, 17 Aug 2023 19:23:20 GMT
- Title: Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken
Language Understanding
- Authors: Mutian He, Philip N. Garner
- Abstract summary: Large pretrained language models have demonstrated strong language understanding capabilities.
We evaluate several such models like ChatGPT and OPT of different sizes on multiple benchmarks.
We show, however, that the model is worse at slot filling, and its performance is sensitive to ASR errors.
- Score: 13.352795145385645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, large pretrained language models have demonstrated strong language
understanding capabilities. This is particularly reflected in their zero-shot
and in-context learning abilities on downstream tasks through prompting. To
assess their impact on spoken language understanding (SLU), we evaluate several
such models like ChatGPT and OPT of different sizes on multiple benchmarks. We
verify the emergent ability unique to the largest models as they can reach
intent classification accuracy close to that of supervised models with zero or
few shots on various languages given oracle transcripts. By contrast, the
results for smaller models fitting a single GPU fall far behind. We note that
the error cases often arise from the annotation scheme of the dataset;
responses from ChatGPT are still reasonable. We show, however, that the model
is worse at slot filling, and its performance is sensitive to ASR errors,
suggesting serious challenges for the application of those textual models on
SLU.
Related papers
- Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Perturbed examples reveal invariances shared by language models [8.04604449335578]
We introduce a novel framework to compare two NLP models.
Via experiments on models from the same and different architecture families, this framework offers insights about how changes in models affect linguistic capabilities.
arXiv Detail & Related papers (2023-11-07T17:48:35Z) - Evaluating Large Language Models on Controlled Generation Tasks [92.64781370921486]
We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities.
After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models.
arXiv Detail & Related papers (2023-10-23T03:48:24Z) - Scalable Performance Analysis for Vision-Language Models [26.45624201546282]
Joint vision-language models have shown great performance over a diverse set of tasks.
Our paper introduces a more scalable solution that relies on already annotated benchmarks.
We confirm previous findings that CLIP behaves like a bag of words model and performs better with nouns and verbs.
arXiv Detail & Related papers (2023-05-30T06:40:08Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Internet-augmented language models through few-shot prompting for
open-domain question answering [6.573232954655063]
We capitalize on the unique few-shot capabilities offered by large-scale language models to overcome some of their challenges.
We use few-shot prompting to learn to condition language models on information returned from the web using Google Search.
We find that language models conditioned on the web surpass performance of closed-book models of similar, or even larger, model sizes in open-domain question answering.
arXiv Detail & Related papers (2022-03-10T02:24:14Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.