Fugu-MT 論文翻訳(概要): Small Language Models for Agentic Systems: A Survey of Architectures, Capabilities, and Deployment Trade offs

論文の概要: Small Language Models for Agentic Systems: A Survey of Architectures, Capabilities, and Deployment Trade offs

arxiv url: http://arxiv.org/abs/2510.03847v1
Date: Sat, 04 Oct 2025 15:48:04 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.296119
Title: Small Language Models for Agentic Systems: A Survey of Architectures, Capabilities, and Deployment Trade offs
Title（参考訳）: エージェントシステムのための小さな言語モデル:アーキテクチャ、能力、デプロイメントのトレードオフに関する調査
Authors: Raghav Sharma, Manan Mehta,
Abstract要約: 小型言語モデル(SLM: 1-12B パラム、時には 20B まで)は十分であり、エージェント処理に優れていることが多い。オープンおよびプロプライエタリなSLMにまたがって最近のエビデンスを合成し、近代的な評価に結び付ける。本研究では,不確実性を考慮したルーティングと検証器カスケードを用いたSLMフォールバックシステムを定式化し,実生産目標を反映したエンジニアリングメトリクスを提案する。
参考スコア（独自算出の注目度）: 0.10742675209112619
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Small language models (SLMs; 1-12B params, sometimes up to 20B) are sufficient and often superior for agentic workloads where the objective is schema- and API-constrained accuracy rather than open-ended generation. We synthesize recent evidence across open and proprietary SLMs (Phi-4-Mini, Qwen-2.5-7B, Gemma-2-9B, Llama-3.2-1B/3B, Ministral-3B/8B, Apple on-device 3B, DeepSeek-R1-Distill) and connect it to modern evaluations (BFCL v3/v4, StableToolBench) and serving stacks (vLLM, SGLang, TensorRT-LLM) paired with guided decoding libraries (XGrammar, Outlines). We formalize SLM-default, LLM-fallback systems with uncertainty-aware routing and verifier cascades, and propose engineering metrics that reflect real production goals: cost per successful task (CPS), schema validity rate, executable call rate, p50/p95 latency, and energy per request. Guided decoding, strict JSON Schema outputs, and validator-first tool execution close much of the capability gap with larger models and often let SLMs match or surpass LLMs on tool use, function calling, and RAG at 10x-100x lower token cost with materially better latency and energy. We provide design patterns for agent stacks that prioritize SLMs: schema-first prompting, type-safe function registries, confidence scoring with verifier rollups, and lightweight adaptation via LoRA/QLoRA. We also delineate limits where fallback remains valuable (open-domain reasoning and some long-horizon planning). The result is a practical blueprint for building fast, inexpensive, and reliable agents that default to SLMs while preserving headroom with targeted LLM assistance. Keywords: small language models, agents, function calling, structured outputs, JSON Schema, guided decoding, LoRA/QLoRA, routing, energy efficiency, edge inference
Abstract（参考訳）: 小言語モデル(SLM: 1-12B params、時には最大20B)は、オープンエンドジェネレーションよりもスキーマとAPI制約の正確さを目標とするエージェントワークロードに対して十分であり、しばしば優れている。我々は、オープンでプロプライエタリなSLM(Phi-4-Mini, Qwen-2.5-7B, Gemma-2-9B, Llama-3.2-1B/3B, Ministral-3B/8B, Apple on-device 3B, DeepSeek-R1-Distill)にまたがって最近のエビデンスを合成し、モダンな評価(BFCL v3/v4, StableToolBench)に接続し、スタック(vLLM, SGLang, TensorRT-LLM)をガイドデコードライブラリ(XGrammar, Outlines)と組み合わせて提供する。我々は、不確実性を考慮したルーティングと検証ケードを備えたSLMフォールバックシステムを形式化し、実際の生産目標(CPS)を反映したエンジニアリングメトリクス(コスト・パー・成功タスク(コスト・パー・成功タスク)、スキーマ妥当性率、実行可能呼び出し率、p50/p95レイテンシ、要求毎のエネルギー)を提案する。ガイド付きデコーディング、厳密なJSONスキーマ出力、バリデータファーストツールの実行は、より大きなモデルとの機能ギャップの大部分を埋め、SLMがツール使用、関数呼び出し、RAGを10倍から100倍低いトークンコストで、非常に優れたレイテンシとエネルギで一致させたり、超えたりすることが多い。 SLMを優先するエージェントスタックの設計パターンとして,スキーマファーストプロンプト,タイプセーフな関数レジストリ,検証ロールアップによる信頼性スコアリング,LoRA/QLoRAによる軽量適応などを提供する。また、フォールバックが価値を維持する限界(オープンドメイン推論と長期計画)も明確化しています。その結果は、高速で安価で信頼性の高いエージェントを構築するための実用的な青写真であり、目標のLSM支援でヘッドルームを保ちながら、SLMをデフォルトにしている。キーワード:小言語モデル、エージェント、関数呼び出し、構造化出力、JSONスキーマ、ガイド付きデコーディング、LoRA/QLoRA、ルーティング、エネルギー効率、エッジ推論

論文の概要: Small Language Models for Agentic Systems: A Survey of Architectures, Capabilities, and Deployment Trade offs

関連論文リスト