Fugu-MT 論文翻訳(概要): Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination Reduction

論文の概要: Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination Reduction

arxiv url: http://arxiv.org/abs/2603.10047v1
Date: Sun, 08 Mar 2026 19:15:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.589872
Title: Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination Reduction
Title（参考訳）: てんかんの安定に向けて:工業用LLMの幻覚軽減のための工学的整合性手順
Authors: Brian Freeman, Adam Kicklighter, Matt Erdman, Zach Gordon,
Abstract要約: 大型言語モデル (LLM) における幻覚は、一貫性はあるが事実的に不正確であり、文脈的に矛盾する出力である。モデル出力のばらつきを低減するための5つの迅速なエンジニアリング戦略を提示し、比較する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hallucinations in large language models (LLMs) are outputs that are syntactically coherent but factually incorrect or contextually inconsistent. They are persistent obstacles in high-stakes industrial settings such as engineering design, enterprise resource planning, and IoT telemetry platforms. We present and compare five prompt engineering strategies intended to reduce the variance of model outputs and move toward repeatable, grounded results without modifying model weights or creating complex validation models. These methods include: (M1) Iterative Similarity Convergence, (M2) Decomposed Model-Agnostic Prompting, (M3) Single-Task Agent Specialization, (M4) Enhanced Data Registry, and (M5) Domain Glossary Injection. Each method is evaluated against an internal baseline using an LLM-as-Judge framework over 100 repeated runs per method (same fixed task prompt, stochastic decoding at $τ= 0.7$. Under this evaluation setup, M4 (Enhanced Data Registry) received ``Better'' verdicts in all 100 trials; M3 and M5 reached 80\% and 77\% respectively; M1 reached 75\%; and M2 was net negative at 34\% when compared to single shot prompting with a modern foundation model. We then developed enhanced version 2 (v2) implementations and assessed them on a 10-trial verification batch; M2 recovered from 34\% to 80\%, the largest gain among the four revised methods. We discuss how these strategies help overcome the non-deterministic nature of LLM results for industrial procedures, even when absolute correctness cannot be guaranteed. We provide pseudocode, verbatim prompts, and batch logs to support independent assessment.
Abstract（参考訳）: 大型言語モデル (LLM) における幻覚は、構文的に一貫性があるが、事実的に誤りまたは文脈的に矛盾した出力である。エンジニアリング設計、エンタープライズリソース計画、IoTテレメトリプラットフォームといった高度な産業環境において、それらは永続的な障害である。モデル重みを変更したり、複雑な検証モデルを作成することなく、モデル出力のばらつきを減らし、繰り返し、接地された結果に向かって進むことを目的とした5つの迅速なエンジニアリング戦略を提示し、比較する。 M1) 反復類似性収束、(M2) 分解モデル非依存プロンプト、(M3) シングルタスクエージェント特殊化、(M4) 強化データレジストリ、(M5) ドメイン用語注入。 LLM-as-Judgeフレームワークはメソッド毎に100回以上実行されている(固定タスクプロンプト、確率デコーディングは$τ= 0.7$)。この評価設定の下で、M4 (Enhanced Data Registry) は、100回の試験で ``Better' の判定を受け、M3 と M5 はそれぞれ 80\% と 77\% に達し、M1 は 75\% に達し、M2 は、近代的な基礎モデルによる単発撮影と比較して、34\% で負の値を示した。次に,拡張バージョン2 (v2) の実装を開発し,M2 は 34 % から 80 % まで回復した。絶対的正当性を保証できない場合においても,これらの戦略が産業手続におけるLCM結果の非決定論的性質を克服する上でどのように役立つかについて議論する。独立したアセスメントをサポートするために、擬似コード、動詞のプロンプト、バッチログを提供する。

論文の概要: Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination Reduction

関連論文リスト