Related papers: Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation

Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation

URL: http://arxiv.org/abs/2503.07826v1
Date: Mon, 10 Mar 2025 20:13:07 GMT
Title: Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation
Authors: Fan Yin, Zifeng Wang, I-Hung Hsu, Jun Yan, Ke Jiang, Yanfei Chen, Jindong Gu, Long T. Le, Kai-Wei Chang, Chen-Yu Lee, Hamid Palangi, Tomas Pfister,
Abstract summary: We propose a principled framework for synthesizing high-quality training trajectories for large language model agents.<n>The framework is based on automatic and iterative translations from a function signature path to a sequence of queries and executable function calls.<n> Experiments show that training with the positive trajectories with supervised fine-tuning and preference optimization against negative trajectories, our 14B model, Magnet-14B-mDPO, obtains 68.01 on BFCL-v3 and 73.30 on ToolQuery.
Score: 85.68881632498909
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have exhibited the ability to effectively utilize external tools to address user queries. However, their performance may be limited in complex, multi-turn interactions involving users and multiple tools. To address this, we propose Magnet, a principled framework for synthesizing high-quality training trajectories to enhance the function calling capability of large language model agents in multi-turn conversations with humans. The framework is based on automatic and iterative translations from a function signature path to a sequence of queries and executable function calls. We model the complicated function interactions in multi-turn cases with graph and design novel node operations to build reliable signature paths. Motivated by context distillation, when guiding the generation of positive and negative trajectories using a teacher model, we provide reference function call sequences as positive hints in context and contrastive, incorrect function calls as negative hints. Experiments show that training with the positive trajectories with supervised fine-tuning and preference optimization against negative trajectories, our 14B model, Magnet-14B-mDPO, obtains 68.01 on BFCL-v3 and 73.30 on ToolQuery, surpassing the performance of the teacher model Gemini-1.5-pro-002 by a large margin in function calling.

Related papers

Bayesian Optimization for Enhanced Language Models: Optimizing Acquisition Functions [0.6554326244334868]
This work introducesBilevel - BO - SWA, a model fusion approach coupled with a bilevel BO strategy to improve the fine - tunning of large language models.<n>Our work on mixture of acquisition functions like EI and UCB into nested opt loops, where inner loop perform minimization of training loss while outer loops optimized w.r.t. val metric.<n>Experiments on GLUE tasks using RoBERTA - base show that when using EI and UCB, there is an improvement in generalization, and fine - tuning can be improved by up to 2.7%.
arXiv Detail & Related papers (2025-05-22T10:16:56Z)
Small Models, Big Tasks: An Exploratory Empirical Study on Small Language Models for Function Calling [6.102559098873098]
Function calling is a complex task with widespread applications in domains such as information retrieval, software engineering and automation. Large Language Models (LLMs) can automate this process but are computationally expensive and impractical in resource-constrained settings. Small Language Models (SLMs) can operate efficiently, offering faster response times, and lower computational demands.
arXiv Detail & Related papers (2025-04-27T15:26:51Z)
Reasoning with Reinforced Functional Token Tuning [70.96651128307985]
We propose Reinforced Functional Token Tuning (RFTT) to empower Large Language Models (LLMs) with self-play learn-to-reason capabilities.<n>RFTT embeds a rich set of learnable functional tokens directly into the model vocabulary, enabling chain-of-thought construction with diverse human-like reasoning behaviors.
arXiv Detail & Related papers (2025-02-19T02:59:42Z)
HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios [31.43638572775755]
HammerBench is a novel framework for assessing mobile assistant function-calling capabilities in real-world, multi-turn dialogues.<n>Our experiments reveal that different types of parameter name errors are a significant source of failure across different interaction scenarios.
arXiv Detail & Related papers (2024-12-21T07:33:55Z)
Building Math Agents with Multi-Turn Iterative Preference Learning [56.71330214021884]
This paper studies the complementary direct preference learning approach to further improve model performance.<n>Existing direct preference learning algorithms are originally designed for the single-turn chat task.<n>We introduce a multi-turn direct preference learning framework, tailored for this context.
arXiv Detail & Related papers (2024-09-04T02:41:04Z)
ToolACE: Winning the Points of LLM Function Calling [139.07157814653638]
ToolACE is an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard.
arXiv Detail & Related papers (2024-09-02T03:19:56Z)
Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks [35.97890508648945]
We introduce the-20B-FUNCTIONCALLING model under an Apache 2.0 license. The model is trained using a multi-task training approach on seven fundamental tasks. We show that-20B-FUNCTIONCALLING has better generalizability on multiple tasks in seven different evaluation datasets.
arXiv Detail & Related papers (2024-06-27T17:47:26Z)
Aligning Language Models with Demonstrated Feedback [58.834937450242975]
Demonstration ITerated Task Optimization (DITTO) directly aligns language model outputs to a user's demonstrated behaviors. We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts.
arXiv Detail & Related papers (2024-06-02T23:13:56Z)
Mixture of Latent Experts Using Tensor Products [44.816454454687]
In multi-task learning, the conventional approach involves training a model on multiple tasks simultaneously.<n>We investigate if modular language models can facilitate positive transfer and systematic generalization.<n>Specifically, we propose a novel modular language model (textttTensorPoly) that balances parameter efficiency with nuanced routing methods.
arXiv Detail & Related papers (2024-05-26T19:25:08Z)
Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents [41.14201835950814]
Large language models (LLMs) have achieved success in acting as agents, which interact with environments through tools such as search engines. Previous work has first collected interaction trajectories between LLMs and environments, using only trajectories that successfully finished the task to fine-tune smaller models. We argue that unsuccessful trajectories offer valuable insights, and LLMs can learn from these trajectories through appropriate quality control and fine-tuning strategies.
arXiv Detail & Related papers (2024-02-18T17:10:07Z)
Task-Feature Collaborative Learning with Application to Personalized Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL) Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks. As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.