Fugu-MT 論文翻訳(概要): LLM-Assisted Formalization Enables Deterministic Detection of Statutory Inconsistency in the Internal Revenue Code

論文の概要: LLM-Assisted Formalization Enables Deterministic Detection of Statutory Inconsistency in the Internal Revenue Code

arxiv url: http://arxiv.org/abs/2511.11954v1
Date: Sat, 15 Nov 2025 00:05:02 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-18 14:36:23.419519
Title: LLM-Assisted Formalization Enables Deterministic Detection of Statutory Inconsistency in the Internal Revenue Code
Title（参考訳）: LLM支援型形式化は、内国歳入法における統計の不整合を決定論的に検出することを可能にする
Authors: Borchuluun Yadamsuren, Steven Keith Platt, Miguel Diaz,
Abstract要約: 本研究では, 複素法則の不整合を決定論的に検出する, ハイブリッド型ニューロシンボリック・フレームワークを提案する。我々は、米国内国歳入法(IRC)をケーススタディとして使用しています。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study introduces a hybrid neuro-symbolic framework that achieves deterministic detection of statutory inconsistency in complex law. We use the U.S. Internal Revenue Code (IRC) as a case study because its complexity makes it a fertile domain for identifying conflicts. Our research offers a solution for detecting inconsistent provisions by combining Large Language Models (LLMs) with symbolic logic. LLM-based methods can support compliance, fairness, and statutory drafting, yet tax-specific applications remain sparse. A key challenge is that such models struggle with hierarchical processing and deep structured reasoning, especially over long text. This research addresses these gaps through experiments using GPT-4o, GPT-5, and Prolog. GPT-4o was first used to translate Section 121 into Prolog rules and refine them in SWISH. These rules were then incorporated into prompts to test whether Prolog-augmented prompting improved GPT-4o's inconsistency detection. GPT-4o, whether prompted with natural language alone or with Prolog augmentation, detected the inconsistency in only one of three strategies (33 percent accuracy), but its reasoning quality differed: natural-language prompting achieved 100 percent rule coverage, while Prolog-augmented prompting achieved 66 percent, indicating more incomplete statutory analysis. In contrast to probabilistic prompting, the hybrid Prolog model produced deterministic and reproducible results. Guided by GPT-5 for refinement, the model formalized the IRC section's competing interpretations and successfully detected an inconsistency zone. Validation tests confirm that the Prolog implementation is accurate, internally consistent, deterministic, and capable of autonomously identifying inconsistencies. These findings show that LLM-assisted formalization, anchored in symbolic logic, enables transparent and reliable statutory inconsistency detection.
Abstract（参考訳）: 本研究では, 複素法則の不整合を決定論的に検出する, ハイブリッド型ニューロシンボリック・フレームワークを提案する。我々は、米国内国歳入法(IRC)をケーススタディとして使用しています。本研究は,Large Language Models (LLMs) と記号論理を組み合わせることで,一貫性のない条件を検出するソリューションを提供する。 LLMベースのメソッドは、コンプライアンス、公正性、法定起草をサポートすることができるが、税別適用は少ないままである。重要な課題は、このようなモデルが階層的な処理と深い構造化された推論、特に長いテキストに苦しむことである。本研究は, GPT-4o, GPT-5, Prologを用いた実験により, これらのギャップに対処する。 GPT-4oは、最初に第121節をPrologルールに変換し、それらをSWISHで洗練するために使用された。これらのルールは、GPT-4oの不整合検出を改善したProlog拡張が促されるかどうかをテストするプロンプトに組み込まれた。 GPT-4oは、自然言語単独でもProlog拡張でも、3つの戦略のうち1つ(33%の精度)で矛盾を検知したが、その推論品質は異なっていた。確率的プロンプトとは対照的に、ハイブリッドPrologモデルは決定論的かつ再現可能な結果を生み出した。改良のために GPT-5 でガイドされたこのモデルは、IRC セクションの競合する解釈を形式化し、不整合ゾーンを正常に検出した。検証テストは、Prologの実装が正確で、内部的に一貫性があり、決定論的であり、不整合を自律的に識別できることを確認した。これらの結果から, 記号論理に固定されたLCM支援形式化により, 透過的かつ信頼性の高い規則の不整合検出が可能であることが示唆された。

関連論文リスト

ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs [54.154593699263074]
ProtoReasoningは、大規模推論モデルの推論能力を高めるフレームワークである。 ProtoReasoningは問題を対応するプロトタイプ表現に変換する。 ProtoReasoningは論理的推論に基づくベースラインモデルよりも4.7%改善されている。
論文参考訳（メタデータ） (2025-06-18T07:44:09Z)
Explainable Compliance Detection with Multi-Hop Natural Language Inference on Assurance Case Structure [1.5653612447564105]
自然言語推論(NLI)に基づくコンプライアンス検出手法を提案する。保証ケースのクレーム・アビデンス・エビデンス構造をマルチホップ推論として定式化し、説明可能かつトレーサブルなコンプライアンス検出を行う。本結果は,規制コンプライアンスプロセスの自動化におけるNLIベースのアプローチの可能性を強調した。
論文参考訳（メタデータ） (2025-06-10T11:56:06Z)
CLATTER: Comprehensive Entailment Reasoning for Hallucination Detection [60.98964268961243]
我々は,系統的かつ包括的な推論プロセスを実行するためのモデルを導くことで,モデルがよりきめ細やかで正確な絞り込み決定を実行できることを提案する。我々は,(i)クレームの分解,(ii)サブクレームの属性と包含分類,および(iii)集約分類から成る3段階の推論プロセスを定義し,そのような導出推論が実際に幻覚検出の改善をもたらすことを示す。
論文参考訳（メタデータ） (2025-06-05T17:02:52Z)
Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks [12.436681393835626]
大きな言語モデル(LLM)は、正式な仕様を生成することで自動推論の民主化を約束する。 LLM出力をモデル化するための確率論的文脈自由文法(PCFG)フレームワークを導入する。最後に、これらの信号の軽量な融合により、選択的な検証が可能となり、最小の棄権でエラーを劇的に削減する(14100%)。
論文参考訳（メタデータ） (2025-05-26T14:34:04Z)
Towards Logically Sound Natural Language Reasoning with Logic-Enhanced Language Model Agents [3.5083201638203154]
Logic-Enhanced Language Model Agents (LELMA) は、大きな言語モデルと形式論理を統合するフレームワークである。 LeLMAは自動形式化を用いて推論を論理表現に変換し、論理的妥当性を評価する。 LeLMAはエラー検出の精度が高く,自己修正による推論精度の向上を実現している。
論文参考訳（メタデータ） (2024-08-28T18:25:35Z)
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
大規模言語モデル(LLM)の論理的推論能力を評価・拡張するための新しいアプローチであるLogicAskerを紹介する。提案手法は, LLMが論理規則を学習する際の大きなギャップを明らかにし, 異なるモデル間で29%から90%の推論失敗を識別する。 GPT-4oのようなモデルにおける論理的推論を最大5%向上させることで、これらの知見を活用して、ターゲットとなる実演例と微調整データを構築した。
論文参考訳（メタデータ） (2024-01-01T13:53:53Z)
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs [95.07757789781213]
LLMの複雑な推論には2行のアプローチが採用されている。 1行の作業は様々な推論構造を持つLLMを誘導し、構造出力は自然に中間推論ステップと見なすことができる。他方の行では、LCMのない宣言的解法を用いて推論処理を行い、推論精度は向上するが、解法のブラックボックスの性質により解釈性に欠ける。具体的には,Prologインタプリタが生成した中間検索ログにアクセスし,人間可読推論に解釈可能であることを示す。
論文参考訳（メタデータ） (2023-11-16T11:26:21Z)
From Ambiguity to Explicitness: NLP-Assisted 5G Specification Abstraction for Formal Analysis [5.526122280732959]
我々はNLPツールを用いてデータを構築し、構築されたデータを用いて識別子と形式的特性を抽出する2段階パイプラインを提案する。最適モデルの結果は,抽出精度が39%,形式的特性の同定精度が42%に達している。
論文参考訳（メタデータ） (2023-08-07T03:37:31Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。