ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models
- URL: http://arxiv.org/abs/2503.00564v1
- Date: Sat, 01 Mar 2025 17:23:51 GMT
- Title: ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models
- Authors: Jeonghoon Shim, Gyuhyeon Seo, Cheongsu Lim, Yohan Jo,
- Abstract summary: We release ToolDial, a dataset comprising 11,111 multi-turn dialogues, with an average of 8.95 turns per dialogue, based on APIs from RapidAPI.<n>We simulate dialogues where the system requests necessary information from the user based on API documentation and seeks additional APIs if the user fails to provide the required information.<n>We evaluate a suite of language models on their ability to predict correct actions and extract input parameter values for API calls from the dialogue history.
- Score: 1.82618237315022
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tool-Augmented Language Models (TALMs) leverage external APIs to answer user queries across various domains. However, existing benchmark datasets for TALM research often feature simplistic dialogues that do not reflect real-world scenarios, such as the need for models to ask clarifying questions or proactively call additional APIs when essential information is missing. To address these limitations, we construct and release ToolDial, a dataset comprising 11,111 multi-turn dialogues, with an average of 8.95 turns per dialogue, based on APIs from RapidAPI. ToolDial has two key characteristics. First, the dialogues incorporate 16 user and system actions (e.g., "Request", "Clarify", "Fail inform") to capture the rich dynamics of real-world interactions. Second, we simulate dialogues where the system requests necessary information from the user based on API documentation and seeks additional APIs if the user fails to provide the required information. To facilitate this process, we introduce a method for generating an API graph that represents input and output compatibility between APIs. Using ToolDial, we evaluate a suite of language models on their ability to predict correct actions and extract input parameter values for API calls from the dialogue history. Modern language models achieve accuracy scores below 70%, indicating substantial room for improvement. We release our dataset and code at https://github.com/holi-lab/ToolDial.
Related papers
- ToolACE: Winning the Points of LLM Function Calling [139.07157814653638]
ToolACE is an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data.
We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard.
arXiv Detail & Related papers (2024-09-02T03:19:56Z) - Interpreting User Requests in the Context of Natural Language Standing
Instructions [89.12540932734476]
We develop NLSI, a language-to-program dataset consisting of over 2.4K dialogues spanning 17 domains.
A key challenge in NLSI is to identify which subset of the standing instructions is applicable to a given dialogue.
arXiv Detail & Related papers (2023-11-16T11:19:26Z) - SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented
Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD.
Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z) - q2d: Turning Questions into Dialogs to Teach Models How to Search [11.421839177607147]
We propose q2d: an automatic data generation pipeline that generates information-seeking dialogs from questions.
Unlike previous approaches which relied on human written dialogs with search queries, our method allows to automatically generate query-based grounded dialogs with better control and scale.
arXiv Detail & Related papers (2023-04-27T16:39:15Z) - Dialog2API: Task-Oriented Dialogue with API Description and Example
Programs [57.336201096903466]
We introduce a new paradigm for task-oriented dialogue - Dialog2API - to greatly expand the functionality and provide seamless dialogue experience.
The model also manages the dialogue policy and interact with the user through generating appropriate natural language responses.
Dialog2API can work with many application scenarios such as software automation and customer service.
arXiv Detail & Related papers (2022-12-20T01:52:46Z) - CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog
Evaluation [75.60156479374416]
CGoDial is a new challenging and comprehensive Chinese benchmark for Goal-oriented Dialog evaluation.
It contains 96,763 dialog sessions and 574,949 dialog turns totally, covering three datasets with different knowledge sources.
To bridge the gap between academic benchmarks and spoken dialog scenarios, we either collect data from real conversations or add spoken features to existing datasets via crowd-sourcing.
arXiv Detail & Related papers (2022-11-21T16:21:41Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-Based
Simulation [13.943378554273377]
We propose NeuralWOZ, a novel dialogue collection framework that uses model-based dialogue simulation.
Collector generates dialogues from (1) user's goal instructions, which are the user context and task constraints in natural language, and (2) system's API call results.
Labeler annotates the generated dialogue by formulating the annotation as a multiple-choice problem, in which the candidate labels are extracted from goal instructions and API call results.
arXiv Detail & Related papers (2021-05-30T07:54:54Z) - TicketTalk: Toward human-level performance with end-to-end,
transaction-based dialog systems [10.659519248703273]
We present a data-driven, end-to-end approach to transaction-based dialog systems.
We show that the system performs at near-human levels in terms of verbal response quality and factual grounding accuracy.
We introduce TicketTalk, a movie ticketing dialog dataset with 23,789 annotated conversations.
arXiv Detail & Related papers (2020-12-23T02:43:37Z) - Dialog Simulation with Realistic Variations for Training Goal-Oriented
Conversational Systems [14.206866126142002]
Goal-oriented dialog systems enable users to complete specific goals like requesting information about a movie or booking a ticket.
We propose an approach for automatically creating a large corpus of annotated dialogs from a few thoroughly annotated sample dialogs and the dialog schema.
We achieve 18? 50% relative accuracy on a held-out test set compared to a baseline dialog generation approach.
arXiv Detail & Related papers (2020-11-16T19:39:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.