Fugu-MT 論文翻訳(概要): Building an Ensemble LLM Semantic Tagger for UN Security Council Resolutions

論文の概要: Building an Ensemble LLM Semantic Tagger for UN Security Council Resolutions

arxiv url: http://arxiv.org/abs/2603.05895v1
Date: Fri, 06 Mar 2026 04:26:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 13:17:45.082435
Title: Building an Ensemble LLM Semantic Tagger for UN Security Council Resolutions
Title（参考訳）: 国連安全保障理事会決議のためのLLMセマンティック・タガーの構築
Authors: Hussein Ghaly,
Abstract要約: 本稿では,国連安全保障理事会決議のセマンティックタグ付けにLLMを用いた新たな手法を提案する。主な目標は、データクリーニングとセマンティックタグタスクのためのアンサンブルシステムを構築するために、LLMパフォーマンスの可変性を活用することである。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces a new methodology for using LLM-based systems for accurate and efficient semantic tagging of UN Security Council resolutions. The main goal is to leverage LLM performance variability to build ensemble systems for data cleaning and semantic tagging tasks. We introduce two evaluation metrics: Content Preservation Ratio (CPR) and Tag Well-Formedness (TWF), in order to avoid hallucinations and unnecessary additions or omissions to the input text beyond the task requirement. These metrics allow the selection of the best output from multiple runs of several GPT models. GPT-4.1 achieved the highest metrics for both tasks (Cleaning: CPR 84.9% - Semantic Tagging: CPR 99.99% and TWF 99.92%). In terms of cost, smaller models, such as GPT-4.1-mini, achieved comparable performance to the best model in each task at only 20% of the cost. These metrics ultimately allowed the ensemble to select the optimal output (both cleaned and tagged content) for all the LLM models involved, across multiple runs. With this ensemble design and the use of metrics, we create a reliable LLM system for performing semantic tagging on challenging texts.
Abstract（参考訳）: 本稿では,国連安全保障理事会決議の正確かつ効率的なセマンティックタグ付けにLLMベースのシステムを利用するための新しい手法を提案する。主な目標は、データクリーニングとセマンティックタグタスクのためのアンサンブルシステムを構築するために、LLMパフォーマンスの可変性を活用することである。本稿では,CPR(Content Preservation Ratio)とTWF(Tag Well-Formedness)の2つの評価指標を紹介する。これらのメトリクスは、複数のGPTモデルの複数の実行から最高の出力を選択することを可能にする。 GPT-4.1は両タスクの最高基準を達成した(Cleaning: CPR 84.9% - Semantic Tagging: CPR 99.99%、TWF 99.92%)。コスト面では、GPT-4.1-miniのような小型モデルは各タスクの20%のコストで最高のモデルに匹敵する性能を達成した。これらのメトリクスは最終的に、複数の実行で関連する全てのLLMモデルに対して、アンサンブルが最適な出力(クリーン化とタグ付けされたコンテンツの両方)を選択することを許した。このアンサンブル設計とメトリクスの利用により、課題のあるテキストにセマンティックタグを付けるための信頼性の高いLLMシステムを構築する。

論文の概要: Building an Ensemble LLM Semantic Tagger for UN Security Council Resolutions

関連論文リスト