Fugu-MT 論文翻訳(概要): Domain-Adapted Small Language Models with Hybrid Post-Processing: Achieving Cost-Efficient, Low-Latency Multi-Label Structured Prediction via LoRA Fine-Tuning on Scarce Data

論文の概要: Domain-Adapted Small Language Models with Hybrid Post-Processing: Achieving Cost-Efficient, Low-Latency Multi-Label Structured Prediction via LoRA Fine-Tuning on Scarce Data

arxiv url: http://arxiv.org/abs/2606.05781v1
Date: Thu, 04 Jun 2026 07:09:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-05 22:39:44.613419
Title: Domain-Adapted Small Language Models with Hybrid Post-Processing: Achieving Cost-Efficient, Low-Latency Multi-Label Structured Prediction via LoRA Fine-Tuning on Scarce Data
Title（参考訳）: ハイブリッド後処理を伴うドメイン適応型小言語モデル:LoRAファインチューニングによるコスト効率・低レイテンシマルチラベル構造予測を実現する
Authors: Srinivasan Manoharan, Dilipkumar Nallusamy, Sachin Kumar, Haifeng Wu,
Abstract要約: 本稿では、微調整された小言語モデルと決定論的ルールベースの後処理層を組み合わせたハイブリッドフレームワークを提案する。 1つのNVIDIA A100 GPU上で実行される推論は約2秒で完了し、フロンティアモデルAPIよりも2～5倍高速である。その結果、ドメイン適応型小言語モデルと決定論的後処理を組み合わせれば、構造化されたコンプライアンス評価のためのフロンティアモデルの精度が一致することを示した。
参考スコア（独自算出の注目度）: 6.3745740668603075
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deploying frontier large language models (LLMs) for domain-specific structured evaluation tasks often incurs substantial latency, cost, and data privacy overhead. We present a hybrid framework that combines a fine-tuned small language model (LLaMA 3.1 8B, with only 2.05% trainable parameters via LoRA) and a deterministic rule-based post-processing layer. Trained on just 219 curated examples, the system is applied to multi-label compliance evaluation of conversational transcripts spanning 18 heterogeneous output fields. In blind evaluation on 53 previously unseen production transcripts, it achieves 100% JSON structural validity, 83.0% human-validated overall accuracy, and 100% accuracy on the most critical classification field. The proposed approach formalizes a hybrid neural-symbolic decomposition and introduces targeted hard-negative augmentation to improve performance on critical decision boundaries. Running on a single NVIDIA A100 GPU, inference completes in approximately 2 seconds, which is 2-5x faster than frontier-model APIs. The system costs only $0.013 per evaluation compared with $0.025-$0.055 for proprietary alternatives, resulting in 46-76% cost savings. These results demonstrate that domain-adapted small language models, when combined with deterministic post-processing, can match frontier-model accuracy for structured compliance evaluation while substantially reducing operational cost, latency, and privacy risk. Keywords: small language models, parameter-efficient fine-tuning, LoRA, domain adaptation, hybrid inference, compliance evaluation, structured output.
Abstract（参考訳）: ドメイン固有の構造化評価タスクのためのフロンティア大言語モデル(LLM)のデプロイは、大きなレイテンシ、コスト、データプライバシのオーバーヘッドを引き起こすことが多い。本稿では,微調整された小言語モデル(LLaMA 3.1 8B,LoRAによるトレーニング可能なパラメータはわずか2.05%)と決定論的ルールベースの後処理層を組み合わせたハイブリッドフレームワークを提案する。このシステムは、219個のキュレートされた例に基づいて、18個の異種出力フィールドにまたがる対話文のマルチラベルコンプライアンス評価に適用される。これまでに見つからなかった53個の生産写本のブラインド評価では、100%JSON構造的妥当性、83.0%の人間検証された全体的な精度、そして最も重要な分類分野における100%の精度を実現している。提案手法は, ハイブリッド型ニューラルシンボリック分解を形式化し, 批判的決定境界における性能向上のために, 目標とする強陰性増強を導入する。 1つのNVIDIA A100 GPU上で実行される推論は約2秒で完了し、フロンティアモデルAPIよりも2～5倍高速である。システム評価は0.013ドルであり、プロプライエタリな代替品は0.025-0.055ドルであり、46-76%のコスト削減となる。これらの結果は、ドメイン適応型小言語モデルと決定論的後処理を組み合わせれば、運用コスト、レイテンシ、プライバシリスクを大幅に低減しつつ、構造化されたコンプライアンス評価のためのフロンティアモデル精度に適合することを示した。キーワード:小さな言語モデル、パラメータ効率の良い微調整、LoRA、ドメイン適応、ハイブリッド推論、コンプライアンス評価、構造化出力。

論文の概要: Domain-Adapted Small Language Models with Hybrid Post-Processing: Achieving Cost-Efficient, Low-Latency Multi-Label Structured Prediction via LoRA Fine-Tuning on Scarce Data

関連論文リスト