Fugu-MT 論文翻訳(概要): AutoSpec: Safety Rule Evolution for LLM Agents via Inductive Logic Programming

論文の概要: AutoSpec: Safety Rule Evolution for LLM Agents via Inductive Logic Programming

arxiv url: http://arxiv.org/abs/2606.24245v1
Date: Tue, 23 Jun 2026 07:31:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:16:48.826742
Title: AutoSpec: Safety Rule Evolution for LLM Agents via Inductive Logic Programming
Title（参考訳）: AutoSpec:インダクティブ論理プログラミングによるLLMエージェントの安全ルール進化
Authors: Pingchuan Ma, Zhaoyu Wang, Zimo Ji, Yuguang Zhou, Zhantong Xue, Zongjie Li, Shuai Wang, Xiaoqin Zhang,
Abstract要約: 既存の安全アプローチは基本的なトレードオフに直面している。本稿では,ユーザセーフ/アンセーフアノテーションから専門家が指導する安全ルールを自動生成するフレームワークであるAutoSpecを紹介する。コード実行とエンボディエージェントドメインにまたがる291の実行トレース上でAutoSpecを評価する。
参考スコア（独自算出の注目度）: 21.12573593471532
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM) agents increasingly automate complex tasks by integrating language models with external tools and environments. However, their autonomy poses significant safety risks: agents may execute destructive commands, leak sensitive data, or violate domain constraints. Existing safety approaches face a fundamental tradeoff: hand-crafted rules are interpretable but brittle, with overly conservative rules blocking safe operations (high false positives) while permissive rules miss unsafe behaviors (high false negatives). Neural classifiers lack the interpretability required for safety-critical deployments. We present AutoSpec, a framework that automatically evolves deployed expert-designed safety rules from user safe/unsafe annotations through counterexample-guided inductive synthesis (CEGIS) guided by inductive logic programming (ILP). Starting from the expert rules and a stream of annotated traces, AutoSpec iteratively evaluates rules, mines false-positive and false-negative counterexamples, uses ILP to learn which predicates discriminate them, generates candidate rule edits, and verifies candidates to select the best revision. The key insight is that ILP efficiently identifies predicates that appear frequently in false negatives but rarely in false positives (or vice versa), dramatically pruning the exponential search space of rule edits. This continues until convergence, producing interpretable rules that balance precision and recall. We evaluate AutoSpec on 291 execution traces spanning code execution and embodied agent domains. AutoSpec raises rule F1 to 0.98 and 0.93 across the two domains, achieving up to 94% false positive reduction while maintaining high recall, and converges within 4-5 iterations. The ILP-guided approach achieves up to 4.8x higher F1 than heuristic CEGIS. The learned rules are human-readable, auditable, and generalize to unseen scenarios.
Abstract（参考訳）: 大きな言語モデル(LLM)エージェントは、言語モデルと外部ツールと環境を統合することで、複雑なタスクを自動化する。エージェントは破壊的なコマンドを実行したり、機密データをリークしたり、ドメインの制約に違反したりすることができる。手作りのルールは解釈可能であるが不安定であり、過度に保守的なルールが安全な操作(高い偽陰性)をブロックし、寛容なルールは安全でない動作(高い偽陰性)を見逃す。ニューラル分類器は、安全クリティカルなデプロイメントに必要な解釈性に欠ける。本稿では,インダクティブ・ロジック・プログラミング(ILP)によって誘導されるインダクティブ・シンセシス(CEGIS)を通じて,ユーザセーフ/アンセーフ・アノテーションから専門家が設計した安全ルールを自動的に進化させるフレームワークであるAutoSpecを提案する。専門家のルールと注釈付きトレースのストリームから始まり、AutoSpecはルールを反復的に評価し、偽陽性と偽陰性の反例をマイニングし、ICPを使用して差別化の予測を学習し、候補のルール編集を生成し、候補が最適なリビジョンを選択することを検証する。鍵となる洞察は、ILPは偽陰性において頻繁に現れる述語を効果的に識別するが、偽陽性(またはその逆)では稀であり、規則編集の指数的な検索空間を劇的に突破するということである。これは収束まで続き、精度とリコールのバランスをとる解釈可能なルールを生成する。コード実行とエンボディエージェントドメインにまたがる291の実行トレース上でAutoSpecを評価する。 AutoSpec は2つの領域で F1 を 0.98 と 0.93 に引き上げ、高いリコールを維持しながら 94% の偽陽性還元を達成し、4-5 イテレーション以内に収束する。 ILP誘導のアプローチは、ヒューリスティックなCEGISよりも最大4.8倍高いF1を達成する。学習されたルールは可読性があり、監査可能で、目に見えないシナリオに一般化される。

論文の概要: AutoSpec: Safety Rule Evolution for LLM Agents via Inductive Logic Programming

関連論文リスト