Small Models, Big Tasks: An Exploratory Empirical Study on Small Language Models for Function Calling
- URL: http://arxiv.org/abs/2504.19277v1
- Date: Sun, 27 Apr 2025 15:26:51 GMT
- Title: Small Models, Big Tasks: An Exploratory Empirical Study on Small Language Models for Function Calling
- Authors: Ishan Kavathekar, Raghav Donakanti, Ponnurangam Kumaraguru, Karthik Vaidhyanathan,
- Abstract summary: Function calling is a complex task with widespread applications in domains such as information retrieval, software engineering and automation.<n>Large Language Models (LLMs) can automate this process but are computationally expensive and impractical in resource-constrained settings.<n>Small Language Models (SLMs) can operate efficiently, offering faster response times, and lower computational demands.
- Score: 6.102559098873098
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Function calling is a complex task with widespread applications in domains such as information retrieval, software engineering and automation. For example, a query to book the shortest flight from New York to London on January 15 requires identifying the correct parameters to generate accurate function calls. Large Language Models (LLMs) can automate this process but are computationally expensive and impractical in resource-constrained settings. In contrast, Small Language Models (SLMs) can operate efficiently, offering faster response times, and lower computational demands, making them potential candidates for function calling on edge devices. In this exploratory empirical study, we evaluate the efficacy of SLMs in generating function calls across diverse domains using zero-shot, few-shot, and fine-tuning approaches, both with and without prompt injection, while also providing the finetuned models to facilitate future applications. Furthermore, we analyze the model responses across a range of metrics, capturing various aspects of function call generation. Additionally, we perform experiments on an edge device to evaluate their performance in terms of latency and memory usage, providing useful insights into their practical applicability. Our findings show that while SLMs improve from zero-shot to few-shot and perform best with fine-tuning, they struggle significantly with adhering to the given output format. Prompt injection experiments further indicate that the models are generally robust and exhibit only a slight decline in performance. While SLMs demonstrate potential for the function call generation task, our results also highlight areas that need further refinement for real-time functioning.
Related papers
- Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.
MeCo captures high-level cognitive signals in the representation space, guiding when to invoke tools.
Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making.
arXiv Detail & Related papers (2025-02-18T15:45:01Z) - Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks [0.8425561594225592]
This study introduces a novel framework for training smaller language models in function calling.
It focuses on specific logical and mathematical reasoning tasks.
The approach aims to improve performances of small-scale models for these tasks using function calling.
arXiv Detail & Related papers (2024-10-24T16:27:35Z) - Hammer: Robust Function-Calling for On-Device Language Models via Function Masking [26.495781685810044]
Hammer is a novel family of foundation models specifically engineered for on-device function calling.
Our empirical evaluations reveal that Hammer not only outperforms larger models but also demonstrates robust generalization across diverse benchmarks.
arXiv Detail & Related papers (2024-10-06T18:57:46Z) - EmbedLLM: Learning Compact Representations of Large Language Models [28.49433308281983]
We propose EmbedLLM, a framework designed to learn compact vector representations of Large Language Models.
We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness.
Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency.
arXiv Detail & Related papers (2024-10-03T05:43:24Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Octopus v2: On-device language model for super agent [10.998608318944985]
Our research presents a new method that empowers an on-device model with 2 billion parameters to surpass the performance of GPT-4 in both accuracy and latency.
When compared to Llama-7B with a RAG-based function calling mechanism, our method enhances latency by 35-fold.
arXiv Detail & Related papers (2024-04-02T09:01:32Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - FIND: A Function Description Benchmark for Evaluating Interpretability
Methods [86.80718559904854]
This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating automated interpretability methods.
FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate.
We evaluate methods that use pretrained language models to produce descriptions of function behavior in natural language and code.
arXiv Detail & Related papers (2023-09-07T17:47:26Z) - Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation
with Large Language Models [116.25562358482962]
State-of-the-art neural language models can be used to solve ad-hoc language tasks without the need for supervised training.
PromptIDE allows users to experiment with prompt variations, visualize prompt performance, and iteratively optimize prompts.
arXiv Detail & Related papers (2022-08-16T17:17:53Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.