Large Language Models Meet Open-World Intent Discovery and Recognition:
An Evaluation of ChatGPT
- URL: http://arxiv.org/abs/2310.10176v1
- Date: Mon, 16 Oct 2023 08:34:44 GMT
- Title: Large Language Models Meet Open-World Intent Discovery and Recognition:
An Evaluation of ChatGPT
- Authors: Xiaoshuai Song, Keqing He, Pei Wang, Guanting Dong, Yutao Mou, Jingang
Wang, Yunsen Xian, Xunliang Cai, Weiran Xu
- Abstract summary: Out-of-domain (OOD) intent discovery and generalized intent discovery (GID) aim to extend a closed intent to open-world intent sets.
Previous methods address them by fine-tuning discriminative models.
ChatGPT exhibits consistent advantages under zero-shot settings, but is still at a disadvantage compared to fine-tuned models.
- Score: 37.27411474856601
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The tasks of out-of-domain (OOD) intent discovery and generalized intent
discovery (GID) aim to extend a closed intent classifier to open-world intent
sets, which is crucial to task-oriented dialogue (TOD) systems. Previous
methods address them by fine-tuning discriminative models. Recently, although
some studies have been exploring the application of large language models
(LLMs) represented by ChatGPT to various downstream tasks, it is still unclear
for the ability of ChatGPT to discover and incrementally extent OOD intents. In
this paper, we comprehensively evaluate ChatGPT on OOD intent discovery and
GID, and then outline the strengths and weaknesses of ChatGPT. Overall, ChatGPT
exhibits consistent advantages under zero-shot settings, but is still at a
disadvantage compared to fine-tuned models. More deeply, through a series of
analytical experiments, we summarize and discuss the challenges faced by LLMs
including clustering, domain-specific understanding, and cross-domain
in-context learning scenarios. Finally, we provide empirical guidance for
future directions to address these challenges.
Related papers
- Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples.
One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports.
Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z) - Chatbots Are Not Reliable Text Annotators [0.0]
ChatGPT is a closed-source product which has major drawbacks with regards to transparency, cost, and data protection.
Recent advances in open-source (OS) large language models (LLMs) offer alternatives which remedy these challenges.
arXiv Detail & Related papers (2023-11-09T22:28:14Z) - Towards LLM-driven Dialogue State Tracking [13.679946384741008]
Large language models (LLMs) such as GPT3 and ChatGPT have sparked considerable interest in assessing their efficacy across diverse applications.
We present LDST, an LLM-driven Dialogue State Tracking framework based on smaller, open-source foundation models.
We find that LDST exhibits remarkable performance improvements in both zero-shot and few-shot setting compared to previous SOTA methods.
arXiv Detail & Related papers (2023-10-23T14:15:28Z) - ChatGPT as Data Augmentation for Compositional Generalization: A Case
Study in Open Intent Detection [30.13634341221476]
We present a case study exploring the use of ChatGPT as a data augmentation technique to enhance compositional generalization in open intent detection tasks.
By incorporating synthetic data generated by ChatGPT into the training process, we demonstrate that our approach can effectively improve model performance.
arXiv Detail & Related papers (2023-08-25T17:51:23Z) - A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark
Datasets [19.521390684403293]
We present a thorough evaluation of ChatGPT's performance on diverse academic datasets.
Specifically, we evaluate ChatGPT across 140 tasks and analyze 255K responses it generates in these datasets.
arXiv Detail & Related papers (2023-05-29T12:37:21Z) - ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP)
This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources.
Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z) - A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding [55.37338324658501]
Zero-shot dialogue understanding aims to enable dialogue to track the user's needs without any training data.
In this work, we investigate the understanding ability of ChatGPT for zero-shot dialogue understanding tasks.
arXiv Detail & Related papers (2023-04-09T15:28:36Z) - ChatGPT-Crawler: Find out if ChatGPT really knows what it's talking
about [15.19126287569545]
This research examines the responses generated by ChatGPT from different Conversational QA corpora.
The study employed BERT similarity scores to compare these responses with correct answers and obtain Natural Language Inference(NLI) labels.
The study identified instances where ChatGPT provided incorrect answers to questions, providing insights into areas where the model may be prone to error.
arXiv Detail & Related papers (2023-04-06T18:42:47Z) - To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection.
We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains.
Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z) - On the Robustness of ChatGPT: An Adversarial and Out-of-distribution
Perspective [67.98821225810204]
We evaluate the robustness of ChatGPT from the adversarial and out-of-distribution perspective.
Results show consistent advantages on most adversarial and OOD classification and translation tasks.
ChatGPT shows astounding performance in understanding dialogue-related texts.
arXiv Detail & Related papers (2023-02-22T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.