Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
- URL: http://arxiv.org/abs/2402.10466v4
- Date: Thu, 30 May 2024 04:19:54 GMT
- Title: Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
- Authors: Zekun Li, Zhiyu Zoey Chen, Mike Ross, Patrick Huber, Seungwhan Moon, Zhaojiang Lin, Xin Luna Dong, Adithya Sagar, Xifeng Yan, Paul A. Crook,
- Abstract summary: We propose a novel approach for solving dialogue state tracking with large language models (LLMs) through function calling.
This method improves zero-shot DST, allowing adaptation to diverse domains without extensive data collection or model tuning.
We show that our approach achieves exceptional performance with both modestly sized open-source and also proprietary LLMs.
- Score: 42.00097476584174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are increasingly prevalent in conversational systems due to their advanced understanding and generative capabilities in general contexts. However, their effectiveness in task-oriented dialogues (TOD), which requires not only response generation but also effective dialogue state tracking (DST) within specific tasks and domains, remains less satisfying. In this work, we propose a novel approach FnCTOD for solving DST with LLMs through function calling. This method improves zero-shot DST, allowing adaptation to diverse domains without extensive data collection or model tuning. Our experimental results demonstrate that our approach achieves exceptional performance with both modestly sized open-source and also proprietary LLMs: with in-context prompting it enables various 7B or 13B parameter models to surpass the previous state-of-the-art (SOTA) achieved by ChatGPT, and improves ChatGPT's performance beating the SOTA by 5.6% average joint goal accuracy (JGA). Individual model results for GPT-3.5 and GPT-4 are boosted by 4.8% and 14%, respectively. We also show that by fine-tuning on a small collection of diverse task-oriented dialogues, we can equip modestly sized models, specifically a 13B parameter LLaMA2-Chat model, with function-calling capabilities and DST performance comparable to ChatGPT while maintaining their chat capabilities. We have made the code publicly available at https://github.com/facebookresearch/FnCTOD
Related papers
- Towards LLM-driven Dialogue State Tracking [13.679946384741008]
Large language models (LLMs) such as GPT3 and ChatGPT have sparked considerable interest in assessing their efficacy across diverse applications.
We present LDST, an LLM-driven Dialogue State Tracking framework based on smaller, open-source foundation models.
We find that LDST exhibits remarkable performance improvements in both zero-shot and few-shot setting compared to previous SOTA methods.
arXiv Detail & Related papers (2023-10-23T14:15:28Z) - ChatGPT for Zero-shot Dialogue State Tracking: A Solution or an
Opportunity? [2.3555053092246125]
We present preliminary experimental results on the ChatGPT research preview, showing that ChatGPT achieves state-of-the-art performance in zero-shot DST.
We theorize that the in-context learning capabilities of such models will likely become powerful tools to support the development of dedicated and dynamic dialogue state trackers.
arXiv Detail & Related papers (2023-06-02T09:15:01Z) - Enhancing Chat Language Models by Scaling High-quality Instructional
Conversations [91.98516412612739]
We first provide a systematically designed, diverse, informative, large-scale dataset of instructional conversations, UltraChat.
Our objective is to capture the breadth of interactions that a human might have with an AI assistant.
We fine-tune a LLaMA model to create a powerful conversational model, UltraLLaMA.
arXiv Detail & Related papers (2023-05-23T16:49:14Z) - Task-Optimized Adapters for an End-to-End Task-Oriented Dialogue System [0.0]
We propose an End-to-end TOD system with Task-d Adapters which learn independently per task, adding only small number of parameters after fixed layers of pre-trained network.
Our method is a model-agnostic approach and does not require prompt-tuning as only input data without a prompt.
arXiv Detail & Related papers (2023-05-04T00:17:49Z) - Exploring the Trade-Offs: Unified Large Language Models vs Local
Fine-Tuned Models for Highly-Specific Radiology NLI Task [49.50140712943701]
We evaluate the performance of ChatGPT/GPT-4 on a radiology NLI task and compare it to other models fine-tuned specifically on task-related data samples.
We also conduct a comprehensive investigation on ChatGPT/GPT-4's reasoning ability by introducing varying levels of inference difficulty.
arXiv Detail & Related papers (2023-04-18T17:21:48Z) - Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on
Self-Chat Data [101.63682141248069]
Chat models, such as ChatGPT, have shown impressive capabilities and have been rapidly adopted across numerous domains.
We propose a pipeline that can automatically generate a high-quality multi-turn chat corpus by leveraging ChatGPT.
We employ parameter-efficient tuning to enhance LLaMA, an open-source large language model.
arXiv Detail & Related papers (2023-04-03T17:59:09Z) - Stabilized In-Context Learning with Pre-trained Language Models for Few
Shot Dialogue State Tracking [57.92608483099916]
Large pre-trained language models (PLMs) have shown impressive unaided performance across many NLP tasks.
For more complex tasks such as dialogue state tracking (DST), designing prompts that reliably convey the desired intent is nontrivial.
We introduce a saliency model to limit dialogue text length, allowing us to include more exemplars per query.
arXiv Detail & Related papers (2023-02-12T15:05:10Z) - Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems [74.8759568242933]
Task-oriented dialogue systems use four connected modules, namely, Natural Language Understanding (NLU), a Dialogue State Tracking (DST), Dialogue Policy (DP) and Natural Language Generation (NLG)
A research challenge is to learn each module with the least amount of samples given the high cost related to the data collection.
We evaluate the priming few-shot ability of language models in the NLU, DP and NLG tasks.
arXiv Detail & Related papers (2020-08-14T08:23:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.