Related papers: CallNavi, A Challenge and Empirical Study on LLM Function Calling and Routing

CallNavi, A Challenge and Empirical Study on LLM Function Calling and Routing

URL: http://arxiv.org/abs/2501.05255v2
Date: Thu, 24 Apr 2025 15:43:32 GMT
Title: CallNavi, A Challenge and Empirical Study on LLM Function Calling and Routing
Authors: Yewei Song, Xunzhu Tang, Cedric Lothritz, Saad Ezzini, Jacques Klein, Tegawendé F. Bissyandé, Andrey Boytsov, Ulrick Ble, Anne Goujon,
Abstract summary: This work contributes to the evaluation and assessment of AI-based software development.<n>Novel benchmarking specifically designed for API function selection, parameter generation, and nested API execution.<n>An empirical evaluation of state-of-the-art language models, analyzing their performance.<n>A hybrid approach to API routing, combining general-purpose large language models for API selection with fine-tuned models and prompt engineering.
Score: 7.443502461016052
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: API-driven chatbot systems are increasingly integral to software engineering applications, yet their effectiveness hinges on accurately generating and executing API calls. This is particularly challenging in scenarios requiring multi-step interactions with complex parameterization and nested API dependencies. Addressing these challenges, this work contributes to the evaluation and assessment of AI-based software development through three key advancements: (1) the introduction of a novel dataset specifically designed for benchmarking API function selection, parameter generation, and nested API execution; (2) an empirical evaluation of state-of-the-art language models, analyzing their performance across varying task complexities in API function generation and parameter accuracy; and (3) a hybrid approach to API routing, combining general-purpose large language models for API selection with fine-tuned models and prompt engineering for parameter generation. These innovations significantly improve API execution in chatbot systems, offering practical methodologies for enhancing software design, testing, and operational workflows in real-world software engineering contexts.

Related papers

Evaluating LLMs on Sequential API Call Through Automated Test Generation [10.621357661774244]
StateGen is an automated framework designed to generate diverse coding tasks involving sequential API interactions.<n>We construct StateEval, a benchmark encompassing 120 verified test cases spanning across three representative scenarios.<n> Experimental results confirm that StateGen can effectively generate challenging and realistic API-oriented tasks.
arXiv Detail & Related papers (2025-07-13T03:52:51Z)
Seq2Seq Model-Based Chatbot with LSTM and Attention Mechanism for Enhanced User Interaction [1.937324318931008]
This work proposes a Sequence-to-Sequence (Seq2Seq) model with an encoder-decoder architecture that incorporates attention mechanisms and Long Short-Term Memory (LSTM) cells.<n>The proposed Seq2Seq model-based robot is trained, validated, and tested on a dataset specifically for the tourism sector in Draa-Tafilalet, Morocco.
arXiv Detail & Related papers (2024-12-27T23:50:54Z)
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models [64.28420991770382]
Data-Juicer 2.0 is a data processing system backed by data processing operators spanning text, image, video, and audio modalities.<n>It supports more critical tasks including data analysis, annotation, and foundation model post-training.<n>It has been widely adopted in diverse research fields and real-world products such as Alibaba Cloud PAI.
arXiv Detail & Related papers (2024-12-23T08:29:57Z)
ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration [70.26807758443675]
ExploraCoder is a training-free framework that empowers large language models to invoke unseen APIs in code solution.<n>We show that ExploraCoder significantly improves performance for models lacking prior API knowledge, achieving an absolute increase of 11.24% over niave RAG approaches and 14.07% over pretraining methods in pass@10.
arXiv Detail & Related papers (2024-12-06T19:00:15Z)
Demystifying Application Programming Interfaces (APIs): Unlocking the Power of Large Language Models and Other Web-based AI Services in Social Work Research [0.0]
Application Programming Interfaces (APIs) are essential tools for social work researchers aiming to harness advanced technologies like Large Language Models (LLMs) and other AI services. This paper demystifies APIs and illustrates how they can enhance research methodologies. Practical code examples demonstrate how LLMs can generate API code for accessing specialized services, such as extracting data from unstructured text.
arXiv Detail & Related papers (2024-10-26T16:07:12Z)
ToolACE: Winning the Points of LLM Function Calling [139.07157814653638]
ToolACE is an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard.
arXiv Detail & Related papers (2024-09-02T03:19:56Z)
FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking [57.53742155914176]
API call generation is the cornerstone of large language models' tool-using ability. Existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user's request. We propose an output-side optimization approach called FANTASE to address these limitations.
arXiv Detail & Related papers (2024-07-18T23:44:02Z)
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents [7.166156709980112]
We introduce textscShortcutsBench, a large-scale benchmark for the comprehensive evaluation of API-based agents. textscShortcutsBench includes a wealth of real APIs from Apple Inc.'s operating systems. Our evaluation reveals significant limitations in handling complex queries related to API selection, parameter filling, and requesting necessary information from systems and users.
arXiv Detail & Related papers (2024-06-28T08:45:02Z)
Semantic API Alignment: Linking High-level User Goals to APIs [6.494714497852088]
We present a vision to span multiple steps from requirements engineering to implementation using existing libraries. This approach, which we call Semantic API Alignment (SEAL), aims to bridge the gap between a user's high-level goals and the specific functions of one or more APIs.
arXiv Detail & Related papers (2024-05-07T11:54:32Z)
Octopus: On-device language model for function calling of software APIs [9.78611123915888]
Large Language Models (LLMs) play a crucial role due to their advanced text processing and generation abilities. This study introduces a new strategy aimed at harnessing on-device LLMs in invoking software APIs.
arXiv Detail & Related papers (2024-04-02T01:29:28Z)
Adaptive REST API Testing with Reinforcement Learning [54.68542517176757]
Current testing tools lack efficient exploration mechanisms, treating all operations and parameters equally. Current tools struggle when response schemas are absent in the specification or exhibit variants. We present an adaptive REST API testing technique incorporates reinforcement learning to prioritize operations during exploration.
arXiv Detail & Related papers (2023-09-08T20:27:05Z)
Enhancing API Documentation through BERTopic Modeling and Summarization [0.0]
This paper focuses on the complexities of interpreting Application Programming Interface (API) documentation. Official API documentation serves as a primary source of information for developers, but it can often be extensive and lacks user-friendliness. Our novel approach employs the strengths of BERTopic for topic modeling and Natural Language Processing (NLP) to automatically generate summaries of API documentation.
arXiv Detail & Related papers (2023-08-17T15:57:12Z)
Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs [66.30706841821123]
Large language models (LLMs) power many state-of-the-art systems in natural language processing. LLMs are extremely computationally expensive, even at inference time. We propose a new metric for comparing inference efficiency across models.
arXiv Detail & Related papers (2023-05-03T21:51:42Z)
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data [101.63682141248069]
Chat models, such as ChatGPT, have shown impressive capabilities and have been rapidly adopted across numerous domains. We propose a pipeline that can automatically generate a high-quality multi-turn chat corpus by leveraging ChatGPT. We employ parameter-efficient tuning to enhance LLaMA, an open-source large language model.
arXiv Detail & Related papers (2023-04-03T17:59:09Z)
On the Effectiveness of Pretrained Models for API Learning [8.788509467038743]
Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner. Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences.
arXiv Detail & Related papers (2022-04-05T20:33:24Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.