Towards LLM-driven Dialogue State Tracking
- URL: http://arxiv.org/abs/2310.14970v1
- Date: Mon, 23 Oct 2023 14:15:28 GMT
- Title: Towards LLM-driven Dialogue State Tracking
- Authors: Yujie Feng, Zexin Lu, Bo Liu, Liming Zhan, Xiao-Ming Wu
- Abstract summary: Large language models (LLMs) such as GPT3 and ChatGPT have sparked considerable interest in assessing their efficacy across diverse applications.
We present LDST, an LLM-driven Dialogue State Tracking framework based on smaller, open-source foundation models.
We find that LDST exhibits remarkable performance improvements in both zero-shot and few-shot setting compared to previous SOTA methods.
- Score: 13.679946384741008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dialogue State Tracking (DST) is of paramount importance in ensuring accurate
tracking of user goals and system actions within task-oriented dialogue
systems. The emergence of large language models (LLMs) such as GPT3 and ChatGPT
has sparked considerable interest in assessing their efficacy across diverse
applications. In this study, we conduct an initial examination of ChatGPT's
capabilities in DST. Our evaluation uncovers the exceptional performance of
ChatGPT in this task, offering valuable insights to researchers regarding its
capabilities and providing useful directions for designing and enhancing
dialogue systems. Despite its impressive performance, ChatGPT has significant
limitations including its closed-source nature, request restrictions, raising
data privacy concerns, and lacking local deployment capabilities. To address
these concerns, we present LDST, an LLM-driven DST framework based on smaller,
open-source foundation models. By utilizing a novel domain-slot instruction
tuning method, LDST achieves performance on par with ChatGPT. Comprehensive
evaluations across three distinct experimental settings, we find that LDST
exhibits remarkable performance improvements in both zero-shot and few-shot
setting compared to previous SOTA methods. The source code is provided for
reproducibility.
Related papers
- Large Language Models as Zero-shot Dialogue State Tracker through Function Calling [42.00097476584174]
We propose a novel approach for solving dialogue state tracking with large language models (LLMs) through function calling.
This method improves zero-shot DST, allowing adaptation to diverse domains without extensive data collection or model tuning.
We show that our approach achieves exceptional performance with both modestly sized open-source and also proprietary LLMs.
arXiv Detail & Related papers (2024-02-16T06:13:18Z) - Chatbots Are Not Reliable Text Annotators [0.0]
ChatGPT is a closed-source product which has major drawbacks with regards to transparency, cost, and data protection.
Recent advances in open-source (OS) large language models (LLMs) offer alternatives which remedy these challenges.
arXiv Detail & Related papers (2023-11-09T22:28:14Z) - Large Language Models Meet Open-World Intent Discovery and Recognition:
An Evaluation of ChatGPT [37.27411474856601]
Out-of-domain (OOD) intent discovery and generalized intent discovery (GID) aim to extend a closed intent to open-world intent sets.
Previous methods address them by fine-tuning discriminative models.
ChatGPT exhibits consistent advantages under zero-shot settings, but is still at a disadvantage compared to fine-tuned models.
arXiv Detail & Related papers (2023-10-16T08:34:44Z) - Evaluating ChatGPT as a Recommender System: A Rigorous Approach [12.458752059072706]
We propose a robust evaluation pipeline to assess ChatGPT's ability as an RS and post-process ChatGPT recommendations.
We analyze the model's functionality in three settings: the Top-N Recommendation, the cold-start recommendation, and the re-ranking of a list of recommendations.
arXiv Detail & Related papers (2023-09-07T10:13:09Z) - Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task.
LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced.
Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z) - Pushing the Limits of ChatGPT on NLP Tasks [79.17291002710517]
Despite the success of ChatGPT, its performances on most NLP tasks are still well below the supervised baselines.
In this work, we looked into the causes, and discovered that its subpar performance was caused by the following factors.
We propose a collection of general modules to address these issues, in an attempt to push the limits of ChatGPT on NLP tasks.
arXiv Detail & Related papers (2023-06-16T09:40:05Z) - A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark
Datasets [19.521390684403293]
We present a thorough evaluation of ChatGPT's performance on diverse academic datasets.
Specifically, we evaluate ChatGPT across 140 tasks and analyze 255K responses it generates in these datasets.
arXiv Detail & Related papers (2023-05-29T12:37:21Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Is ChatGPT a Good NLG Evaluator? A Preliminary Study [121.77986688862302]
We provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric.
Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments.
We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.
arXiv Detail & Related papers (2023-03-07T16:57:20Z) - A Multi-Task BERT Model for Schema-Guided Dialogue State Tracking [78.2700757742992]
Task-oriented dialogue systems often employ a Dialogue State Tracker (DST) to successfully complete conversations.
Recent state-of-the-art DST implementations rely on schemata of diverse services to improve model robustness.
We propose a single multi-task BERT-based model that jointly solves the three DST tasks of intent prediction, requested slot prediction and slot filling.
arXiv Detail & Related papers (2022-07-02T13:27:59Z) - Prompt Learning for Few-Shot Dialogue State Tracking [75.50701890035154]
This paper focuses on how to learn a dialogue state tracking (DST) model efficiently with limited labeled data.
We design a prompt learning framework for few-shot DST, which consists of two main components: value-based prompt and inverse prompt mechanism.
Experiments show that our model can generate unseen slots and outperforms existing state-of-the-art few-shot methods.
arXiv Detail & Related papers (2022-01-15T07:37:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.