Related papers: Large Language Models Meet Open-World Intent Discovery and Recognition: An Evaluation of ChatGPT

Large Language Models Meet Open-World Intent Discovery and Recognition: An Evaluation of ChatGPT

URL: http://arxiv.org/abs/2310.10176v1
Date: Mon, 16 Oct 2023 08:34:44 GMT
Title: Large Language Models Meet Open-World Intent Discovery and Recognition: An Evaluation of ChatGPT
Authors: Xiaoshuai Song, Keqing He, Pei Wang, Guanting Dong, Yutao Mou, Jingang Wang, Yunsen Xian, Xunliang Cai, Weiran Xu
Abstract summary: Out-of-domain (OOD) intent discovery and generalized intent discovery (GID) aim to extend a closed intent to open-world intent sets. Previous methods address them by fine-tuning discriminative models. ChatGPT exhibits consistent advantages under zero-shot settings, but is still at a disadvantage compared to fine-tuned models.
Score: 37.27411474856601
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The tasks of out-of-domain (OOD) intent discovery and generalized intent discovery (GID) aim to extend a closed intent classifier to open-world intent sets, which is crucial to task-oriented dialogue (TOD) systems. Previous methods address them by fine-tuning discriminative models. Recently, although some studies have been exploring the application of large language models (LLMs) represented by ChatGPT to various downstream tasks, it is still unclear for the ability of ChatGPT to discover and incrementally extent OOD intents. In this paper, we comprehensively evaluate ChatGPT on OOD intent discovery and GID, and then outline the strengths and weaknesses of ChatGPT. Overall, ChatGPT exhibits consistent advantages under zero-shot settings, but is still at a disadvantage compared to fine-tuned models. More deeply, through a series of analytical experiments, we summarize and discuss the challenges faced by LLMs including clustering, domain-specific understanding, and cross-domain in-context learning scenarios. Finally, we provide empirical guidance for future directions to address these challenges.

Related papers

Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples. One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports. Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z)
Chatbots Are Not Reliable Text Annotators [0.0]
ChatGPT is a closed-source product which has major drawbacks with regards to transparency, cost, and data protection. Recent advances in open-source (OS) large language models (LLMs) offer alternatives which remedy these challenges.
arXiv Detail & Related papers (2023-11-09T22:28:14Z)
Towards LLM-driven Dialogue State Tracking [13.679946384741008]
Large language models (LLMs) such as GPT3 and ChatGPT have sparked considerable interest in assessing their efficacy across diverse applications. We present LDST, an LLM-driven Dialogue State Tracking framework based on smaller, open-source foundation models. We find that LDST exhibits remarkable performance improvements in both zero-shot and few-shot setting compared to previous SOTA methods.
arXiv Detail & Related papers (2023-10-23T14:15:28Z)
ChatGPT as Data Augmentation for Compositional Generalization: A Case Study in Open Intent Detection [30.13634341221476]
We present a case study exploring the use of ChatGPT as a data augmentation technique to enhance compositional generalization in open intent detection tasks. By incorporating synthetic data generated by ChatGPT into the training process, we demonstrate that our approach can effectively improve model performance.
arXiv Detail & Related papers (2023-08-25T17:51:23Z)
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets [19.521390684403293]
We present a thorough evaluation of ChatGPT's performance on diverse academic datasets. Specifically, we evaluate ChatGPT across 140 tasks and analyze 255K responses it generates in these datasets.
arXiv Detail & Related papers (2023-05-29T12:37:21Z)
ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP) This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources. Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z)
A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding [55.37338324658501]
Zero-shot dialogue understanding aims to enable dialogue to track the user's needs without any training data. In this work, we investigate the understanding ability of ChatGPT for zero-shot dialogue understanding tasks.
arXiv Detail & Related papers (2023-04-09T15:28:36Z)
ChatGPT-Crawler: Find out if ChatGPT really knows what it's talking about [15.19126287569545]
This research examines the responses generated by ChatGPT from different Conversational QA corpora. The study employed BERT similarity scores to compare these responses with correct answers and obtain Natural Language Inference(NLI) labels. The study identified instances where ChatGPT provided incorrect answers to questions, providing insights into areas where the model may be prone to error.
arXiv Detail & Related papers (2023-04-06T18:42:47Z)
To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection. We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains. Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z)
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective [67.98821225810204]
We evaluate the robustness of ChatGPT from the adversarial and out-of-distribution perspective. Results show consistent advantages on most adversarial and OOD classification and translation tasks. ChatGPT shows astounding performance in understanding dialogue-related texts.
arXiv Detail & Related papers (2023-02-22T11:01:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.