Fugu-MT 論文翻訳(概要): Advancing SLM Tool-Use Capability using Reinforcement Learning

論文の概要: Advancing SLM Tool-Use Capability using Reinforcement Learning

arxiv url: http://arxiv.org/abs/2509.04518v1
Date: Wed, 03 Sep 2025 07:41:14 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-08 14:27:25.353768
Title: Advancing SLM Tool-Use Capability using Reinforcement Learning
Title（参考訳）: 強化学習を用いたSLMツール利用能力の向上
Authors: Dhruvi Paprunia, Vansh Kharidia, Pankti Doshi,
Abstract要約: 小型言語モデル (SLM) は、大型言語モデル (LLM) と比較してツール使用に苦戦している。本研究では、強化学習(RL)、特にグループ相対政策最適化(GRPO)を用いてこれらの課題に対処する。計算量が多く,適応性に欠ける従来の微調整手法とは異なり,本手法は効率的かつ効率的な解法である。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have progressed beyond simple text creation, and tool use has become increasingly important for complex, real-world tasks. Tool use in LLMs refers to their ability to utilize external resources such as APIs, databases, or software functions to extend their functionality beyond generating text.Tools are used for tasks such as performing calculations, making API calls to retrieve the current time and date, and more. This capability enables models to fetch real-time data, execute commands, or solve problems requiring dynamic interaction, making it indispensable for applications like AI agents in virtual assistants, robotic control, or automated workflows. However, while LLMs are usually adept tool use, their vast resource requirements and computation complexity restrict their use in every use case.As a result, there is an increasing need for more compact and efficient Small Language Models (SLMs). Small language models (SLMs) struggle in tool use compared to large language models (LLMs). As soon in Table 1. SLMs are typically trained on smaller, more specific datasets, resulting in a narrower knowledge base and limited contextual understanding compared to LLMs. This research addresses these challenges by using Reinforcement Learning (RL), specifically Group Relative Policy Optimization (GRPO), to enhance tool-use proficiency in SLMs. Unlike conventional fine-tuning approaches that require heavy computation and often lack adaptability, our method provides an efficient, effective solution that significantly boosts SLM tool-use accuracy, increasing their practical utility.
Abstract（参考訳）: 大規模言語モデル(LLM)は、単純なテキスト作成を超えて進歩し、複雑な現実世界のタスクにおいてツールの使用がますます重要になっている。 LLMにおけるツールの使用は、API、データベース、ソフトウェア機能などの外部リソースを使用して、テキスト生成を超えて機能を拡張できることを指し、ツールは計算の実行、API呼び出しによる現在の時刻と日付の検索などのタスクに使用される。この機能は、モデルがリアルタイムデータをフェッチしたり、コマンドを実行したり、動的インタラクションを必要とする問題の解決を可能にし、仮想アシスタントやロボット制御、自動化ワークフローにおけるAIエージェントのようなアプリケーションには不可欠である。しかし, LLMは通常, ツールの利用に適しているが, その膨大なリソース要件と計算複雑性により, あらゆるユースケースでの使用が制限されるため, よりコンパクトで効率的なSmall Language Models (SLM) の必要性が高まっている。小型言語モデル(SLM)は、大型言語モデル(LLM)と比較してツールの使用に苦戦している。早速表1に登場。 SLMは通常、より小さな、より特定のデータセットで訓練されるため、LLMと比較して知識ベースが狭く、文脈的理解が限られる。本研究は, 強化学習(RL), 特にグループ相対政策最適化(GRPO)を用いて, SLMにおけるツール利用能力を向上させることで, これらの課題に対処する。計算量が多く,適応性に欠ける従来の微調整手法とは異なり,本手法は,SLMツールの使用精度を大幅に向上し,実用性を向上する効率的かつ効率的なソリューションを提供する。

論文の概要: Advancing SLM Tool-Use Capability using Reinforcement Learning

関連論文リスト