Fugu-MT 論文翻訳(概要): Comparing LLM and Fine-Tuned Model Performance on NVDRS Circumstance Extraction with Varying Prompt Complexity

論文の概要: Comparing LLM and Fine-Tuned Model Performance on NVDRS Circumstance Extraction with Varying Prompt Complexity

arxiv url: http://arxiv.org/abs/2605.21845v1
Date: Thu, 21 May 2026 00:33:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 16:35:42.037612
Title: Comparing LLM and Fine-Tuned Model Performance on NVDRS Circumstance Extraction with Varying Prompt Complexity
Title（参考訳）: NVDRS循環抽出におけるLLMと微調整モデルの性能の比較とVarying Prompt Complexity
Authors: Geoffrey Martin, Xuan Zhong Feng, Yifan Peng,
Abstract要約: そこで本研究では,コード名のみのプロンプトよりも詳細なプロンプトが改良された場合の予測を手作業で行うアルゴリズムを開発した。我々は,国立暴力死亡報告システムから25の複雑な状況下で,大規模言語モデル (LLM) を微調整したRoBERTaに対して評価した。
参考スコア（独自算出の注目度）: 8.474809035213118
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Suicide is a leading cause of death in the United States, and understanding the circumstances that precede it requires extracting structured information from death investigation narratives. Many of these circumstances require semantic inference beyond simple keyword matching. We develop a ``Complexity Score'' algorithm that analyzes coding manual structure to predict when detailed prompts with full coding guidelines improve over name-only prompts. We then construct a hybrid approach that selects prompt strategy per circumstance. We evaluate large language models (LLMs) against fine-tuned RoBERTa on 25 inferentially complex circumstances from the National Violent Death Reporting System (NVDRS). We found that LLMs substantially outperform on low-prevalence circumstances where training data is insufficient. We further demonstrate that our framework generalizes across frontier LLMs, with GPT-5.2, Gemini 2.5 Pro and Llama-3 70B showing consistent performance patterns. These findings support a hybrid architecture where LLMs handle rare, inferentially complex circumstances while fine-tuned models handle common ones.
Abstract（参考訳）: 自殺はアメリカ合衆国における主要な死因であり、それ以前の状況を理解するためには、死亡調査の物語から構造化された情報を抽出する必要がある。これらの状況の多くは、単純なキーワードマッチング以上の意味推論を必要とする。このアルゴリズムは,コード名のみのプロンプトより詳細なプロンプトをいつ改善するかを,手作業で解析して予測する。次に、状況に応じた迅速な戦略を選択するハイブリッドアプローチを構築する。 NVDRS(National Violent Death Reporting System)において,大言語モデル(LLM)と細調整RoBERTa(RoBERTa)を比較した。その結果,LLMはトレーニングデータが不十分な低頻度環境ではかなり優れていた。我々はさらに,GPT-5.2,Gemini 2.5 Pro,Llama-370Bなど,フロンティアのLLMをまたいだフレームワークの一般化を実証した。これらの発見は、LLMが稀で複雑な状況に対処するハイブリッドアーキテクチャをサポートし、微調整されたモデルが一般的な状況に対処する。

論文の概要: Comparing LLM and Fine-Tuned Model Performance on NVDRS Circumstance Extraction with Varying Prompt Complexity

関連論文リスト