Related papers: Tool Calling for Arabic LLMs: Data Strategies and Instruction Tuning

Tool Calling for Arabic LLMs: Data Strategies and Instruction Tuning

URL: http://arxiv.org/abs/2509.20957v1
Date: Thu, 25 Sep 2025 09:45:12 GMT
Title: Tool Calling for Arabic LLMs: Data Strategies and Instruction Tuning
Authors: Asim Ersoy, Enes Altinisik, Husrev Taha Sencar, Kareem Darwish,
Abstract summary: We bridge the resource gap by translating and adapting two open-source tool-calling datasets into Arabic.<n>Our findings provide crucial insights into the optimal strategies for developing robust tool-augmented agents for Arabic.
Score: 8.009383136558823
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tool calling is a critical capability that allows Large Language Models (LLMs) to interact with external systems, significantly expanding their utility. However, research and resources for tool calling are predominantly English-centric, leaving a gap in our understanding of how to enable this functionality for other languages, such as Arabic. This paper investigates three key research questions: (1) the necessity of in-language (Arabic) tool-calling data versus relying on cross-lingual transfer, (2) the effect of general-purpose instruction tuning on tool-calling performance, and (3) the value of fine-tuning on specific, high-priority tools. To address these questions, we conduct extensive experiments using base and post-trained variants of an open-weight Arabic LLM. To enable this study, we bridge the resource gap by translating and adapting two open-source tool-calling datasets into Arabic. Our findings provide crucial insights into the optimal strategies for developing robust tool-augmented agents for Arabic.

Related papers

Arabic Prompts with English Tools: A Benchmark [0.20524609401792393]
This paper introduces the first benchmark for evaluating the tool-calling and agentic capabilities of large language models (LLMs) in Arabic.<n>Our findings reveal a huge performance gap: when users interact in Arabic, tool-calling accuracy drops by an average of 5-10%, regardless of whether the tool descriptions themselves are in Arabic or English.<n>By shedding light on these critical challenges, this benchmark aims to foster the development of more reliable and linguistically equitable AI agents for Arabic-speaking users.
arXiv Detail & Related papers (2026-01-08T16:47:09Z)
Re-Initialization Token Learning for Tool-Augmented Large Language Models [49.91503552002649]
Large language models have demonstrated exceptional performance, yet struggle with complex tasks such as numerical reasoning, plan generation.<n>We propose a novel token learning method that aligns tool tokens with the existing word embedding space.<n>We evaluate the method on tasks such as numerical reasoning, knowledge-based question answering, and embodied plan generation using GSM8K-XL, FuncQA, KAMEL, and VirtualHome datasets.
arXiv Detail & Related papers (2025-06-17T07:11:00Z)
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.<n>MeCo quantifies metacognitive scores by capturing high-level cognitive signals in the representation space.<n>MeCo is fine-tuning-free and incurs minimal cost.
arXiv Detail & Related papers (2025-02-18T15:45:01Z)
Scaffolded Language Models with Language Supervision for Mixed-Autonomy: A Survey [52.00674453604779]
This survey organizes the literature on the design and optimization of emerging structures around post-trained LMs.<n>We refer to this overarching structure as scaffolded LMs and focus on LMs that are integrated into multi-step processes with tools.
arXiv Detail & Related papers (2024-10-21T18:06:25Z)
Enhancing Tool Retrieval with Iterative Feedback from Large Language Models [9.588592185027455]
Large language models (LLMs) can effectively handle a certain amount of tools through in-context learning or fine-tuning. In real-world scenarios, the number of tools is typically extensive and irregularly updated, emphasizing the necessity for a dedicated tool retrieval component. We propose to enhance tool retrieval with iterative feedback from the large language model.
arXiv Detail & Related papers (2024-06-25T11:12:01Z)
Tool Learning with Large Language Models: A Survey [60.733557487886635]
Tool learning with large language models (LLMs) has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems. Despite growing attention and rapid advancements in this field, the existing literature remains fragmented and lacks systematic organization.
arXiv Detail & Related papers (2024-05-28T08:01:26Z)
Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
Real-world systems often incorporate a wide array of tools, making it impractical to input all tools into Large Language Models. Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions. We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools.
arXiv Detail & Related papers (2024-05-25T06:41:23Z)
Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models [26.28459880766842]
We propose a decision-aware and generalizable tool-usage framework (DEER) Specifically, we first construct the tool-usage samples with multiple decision branches via an automatic generation pipeline. Our proposed DEER is effective and significantly outperforms baselines across various datasets.
arXiv Detail & Related papers (2024-02-26T16:11:03Z)
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark [81.42376626294812]
We present Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark. Our aim is to establish LAMM as a growing ecosystem for training and evaluating MLLMs. We present a comprehensive dataset and benchmark, which cover a wide range of vision tasks for 2D and 3D vision.
arXiv Detail & Related papers (2023-06-11T14:01:17Z)
Unravelling Interlanguage Facts via Explainable Machine Learning [10.71581852108984]
We focus on the internals of an NLI classifier trained by an emphexplainable machine learning algorithm. We use this perspective in order to tackle both NLI and a companion task, guessing whether a text has been written by a native or a non-native speaker. We investigate which kind of linguistic traits are most effective for solving our two tasks, namely, are most indicative of a speaker's L1.
arXiv Detail & Related papers (2022-08-02T14:05:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.