Related papers: APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

URL: http://arxiv.org/abs/2406.18518v1
Date: Wed, 26 Jun 2024 17:49:11 GMT
Title: APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
Authors: Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong,
Abstract summary: APIGen is an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets. We release a dataset containing 60,000 high-quality entries, aiming to advance the field of function-calling agent domains.
Score: 99.8988504388011
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scalable and structured manner. Each data in our dataset is verified through three hierarchical stages: format checking, actual function executions, and semantic verification, ensuring its reliability and correctness. We demonstrate that models trained with our curated datasets, even with only 7B parameters, can achieve state-of-the-art performance on the Berkeley Function-Calling Benchmark, outperforming multiple GPT-4 models. Moreover, our 1B model achieves exceptional performance, surpassing GPT-3.5-Turbo and Claude-3 Haiku. We release a dataset containing 60,000 high-quality entries, aiming to advance the field of function-calling agent domains. The dataset is available on Huggingface: https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k and the project homepage: https://apigen-pipeline.github.io/

Related papers

APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay [86.01901238059261]
APIGen-MT is a framework that generates verifiable and diverse multi-turn agent data. We train a family of models -- the xLAM-2-fc-r series with sizes ranging from 1B to 70B parameters. Our models outperform frontier models such as GPT-4o and Claude 3.5 on $tau$-bench and BFCL benchmarks.
arXiv Detail & Related papers (2025-04-04T17:13:57Z)
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance [65.01483640267885]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. We introduce UnitCoder, a systematic pipeline leveraging model-generated unit tests to guide and validate the code generation process. Our work presents a scalable approach that leverages model-generated unit tests to guide the synthesis of high-quality code data from pre-training corpora.
arXiv Detail & Related papers (2025-02-17T05:37:02Z)
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for Foundation Models [64.28420991770382]
We present Data-Juicer 2.0, a new system offering fruitful data processing capabilities backed by over a hundred operators. The system is publicly available, actively maintained, and broadly adopted in diverse research endeavors, practical applications, and real-world products such as Alibaba Cloud PAI.
arXiv Detail & Related papers (2024-12-23T08:29:57Z)
Adaptable and Precise: Enterprise-Scenario LLM Function-Calling Capability Training Pipeline [7.487352346469893]
We propose a training pipeline for function-calling capabilities tailored to real-world business scenarios. This pipeline includes the synthesis and augmentation of scenario-specific function-calling data, model fine-tuning, and performance evaluation and analysis. Our fine-tuned model demonstrated outstanding performance in evaluations and practical applications, surpassing GPT-4 and GPT-4o in accuracy on the test set.
arXiv Detail & Related papers (2024-12-20T08:20:21Z)
REFINE on Scarce Data: Retrieval Enhancement through Fine-Tuning via Model Fusion of Embedding Models [14.023953508288628]
Retrieval augmented generation (RAG) pipelines are commonly used in tasks such as question-answering (QA) We propose REFINE, a novel technique that generates synthetic data from available documents and then uses a model fusion approach to fine-tune embeddings.
arXiv Detail & Related papers (2024-10-16T08:43:39Z)
NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls [24.183027762617233]
We present NESTFUL, a benchmark to evaluate large language models (LLMs) on nested sequences of API calls. We show that the best-performing model on the dataset has a full sequence match accuracy of 25% and win-rate of 34%.
arXiv Detail & Related papers (2024-09-04T17:53:24Z)
ToolACE: Winning the Points of LLM Function Calling [139.07157814653638]
ToolACE is an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard.
arXiv Detail & Related papers (2024-09-02T03:19:56Z)
FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking [57.53742155914176]
API call generation is the cornerstone of large language models' tool-using ability. Existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user's request. We propose an output-side optimization approach called FANTASE to address these limitations.
arXiv Detail & Related papers (2024-07-18T23:44:02Z)
Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks [35.97890508648945]
We introduce the-20B-FUNCTIONCALLING model under an Apache 2.0 license. The model is trained using a multi-task training approach on seven fundamental tasks. We show that-20B-FUNCTIONCALLING has better generalizability on multiple tasks in seven different evaluation datasets.
arXiv Detail & Related papers (2024-06-27T17:47:26Z)
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model [90.71963723884944]
Text-to-image (T2I) generative models have attracted significant attention and found extensive applications within and beyond academic research. We introduce DiffAgent, an agent designed to screen the accurate selection in seconds via API calls. Our evaluations reveal that DiffAgent not only excels in identifying the appropriate T2I API but also underscores the effectiveness of the SFTA training framework.
arXiv Detail & Related papers (2024-03-31T06:28:15Z)
Going beyond research datasets: Novel intent discovery in the industry setting [60.90117614762879]
This paper proposes methods to improve the intent discovery pipeline deployed in a large e-commerce platform. We show the benefit of pre-training language models on in-domain data: both self-supervised and with weak supervision. We also devise the best method to utilize the conversational structure (i.e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv.
arXiv Detail & Related papers (2023-05-09T14:21:29Z)
DataLab: A Platform for Data Analysis and Intervention [96.75253335629534]
DataLab is a unified data-oriented platform that allows users to interactively analyze the characteristics of data. toolname has features for dataset recommendation and global vision analysis. So far, DataLab covers 1,715 datasets and 3,583 of its transformed version.
arXiv Detail & Related papers (2022-02-25T18:32:19Z)
Compositional Generalization for Natural Language Interfaces to Web APIs [26.851998759793453]
This paper presents Okapi, a new dataset for Natural Language to executable web Application Programming Interfaces (NL2API) This dataset is in English and contains 22,508 questions and 9,019 unique API calls, covering three domains. We define new compositional generalization tasks for NL2API which explore the models' ability to extrapolate from simple API calls in the training set to new and more complex API calls in the inference phase.
arXiv Detail & Related papers (2021-12-09T20:49:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.