Related papers: Efficient Function Orchestration for Large Language Models

Efficient Function Orchestration for Large Language Models

URL: http://arxiv.org/abs/2504.14872v1
Date: Mon, 21 Apr 2025 05:57:34 GMT
Title: Efficient Function Orchestration for Large Language Models
Authors: Xiaoxia Liu, Peng Di, Cong Li, Jun Sun, Jingyi Wang,
Abstract summary: This paper introduces LLMOrch, an advanced framework for automated, parallel function calling in large language models.<n>The key principle behind LLMOrch is to identify an available processor to execute a function call while preventing any single processor from becoming overburdened.<n>When comparing with state-of-the-art techniques, LLMOrch demonstrated comparable efficiency improvements in orchestrating I/O-intensive functions.
Score: 10.061268352576406
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Function calling is a fundamental capability of today's large language models, but sequential function calling posed efficiency problems. Recent studies have proposed to request function calls with parallelism support in order to alleviate this issue. However, they either delegate the concurrent function calls to users for execution which are conversely executed sequentially, or overlook the relations among various function calls, rending limited efficiency. This paper introduces LLMOrch, an advanced framework for automated, parallel function calling in large language models. The key principle behind LLMOrch is to identify an available processor to execute a function call while preventing any single processor from becoming overburdened. To this end, LLMOrch models the data relations (i.e., def-use) among different function calls and coordinates their executions by their control relations (i.e., mutual-exclusion) as well as the working status of the underlying processors. When comparing with state-of-the-art techniques, LLMOrch demonstrated comparable efficiency improvements in orchestrating I/O-intensive functions, while significantly outperforming (2$\times$) them with compute-intensive functions. LLMOrch's performance even showed a linear correlation to the number of allocated processors. We believe that these results highlight the potential of LLMOrch as an efficient solution for parallel function orchestration in the context of large language models.

Related papers

Learning Adaptive Parallel Reasoning with Language Models [70.1745752819628]
We propose Adaptive Parallel Reasoning (APR), a novel reasoning framework that enables language models to orchestrate both serialized and parallel computations end-to-end. APR generalizes existing reasoning methods by enabling adaptive multi-threaded inference using spawn() and join() operations. A key innovation is our end-to-end reinforcement learning strategy, optimizing both parent and child inference threads to enhance task success rate without requiring predefined reasoning structures.
arXiv Detail & Related papers (2025-04-21T22:29:02Z)
Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation [85.68881632498909]
We propose a principled framework for synthesizing high-quality training trajectories for large language model agents.<n>The framework is based on automatic and iterative translations from a function signature path to a sequence of queries and executable function calls.<n> Experiments show that training with the positive trajectories with supervised fine-tuning and preference optimization against negative trajectories, our 14B model, Magnet-14B-mDPO, obtains 68.01 on BFCL-v3 and 73.30 on ToolQuery.
arXiv Detail & Related papers (2025-03-10T20:13:07Z)
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario [17.494787282066866]
We introduce ComplexFuncBench, a benchmark for complex function calling across five real-world scenarios. Compared to existing benchmarks, ComplexFuncBench encompasses multi-step and constrained function calling. We propose an automatic framework, ComplexEval, for quantitatively evaluating complex function calling tasks.
arXiv Detail & Related papers (2025-01-17T11:41:53Z)
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks [0.8425561594225592]
This study introduces a novel framework for training smaller language models in function calling. It focuses on specific logical and mathematical reasoning tasks. The approach aims to improve performances of small-scale models for these tasks using function calling.
arXiv Detail & Related papers (2024-10-24T16:27:35Z)
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z)
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models [63.36637269634553]
We present a novel method of further improving performance by requiring models to compare multiple reasoning chains. We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
arXiv Detail & Related papers (2024-07-03T15:01:18Z)
Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks [35.97890508648945]
We introduce the-20B-FUNCTIONCALLING model under an Apache 2.0 license. The model is trained using a multi-task training approach on seven fundamental tasks. We show that-20B-FUNCTIONCALLING has better generalizability on multiple tasks in seven different evaluation datasets.
arXiv Detail & Related papers (2024-06-27T17:47:26Z)
An LLM-Tool Compiler for Fused Parallel Function Calling [1.990293258268139]
State-of-the-art sequential reasoning in Large Language Models (LLMs) has expanded the capabilities of Copilots beyond conversational tasks to complex function calling. We propose LLM-Tool Compiler, which fuses similar types of tool operations under a single function at runtime, presenting them as a unified task to the LLM. Benchmarked on a large-scale Copilot platform, LLM-Tool Compiler achieves up to four times more parallel calls than existing methods, reducing token costs and latency by up to 40% and 12%, respectively.
arXiv Detail & Related papers (2024-05-07T18:55:50Z)
An LLM Compiler for Parallel Function Calling [68.04566807806071]
We introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multiple function calls. We observe consistent latency speedup of up to 3.7x, cost savings of up to 6.7x, and accuracy improvement of up to 9% compared to ReAct.
arXiv Detail & Related papers (2023-12-07T18:32:04Z)
Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs [66.30706841821123]
Large language models (LLMs) power many state-of-the-art systems in natural language processing. LLMs are extremely computationally expensive, even at inference time. We propose a new metric for comparing inference efficiency across models.
arXiv Detail & Related papers (2023-05-03T21:51:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.