Fugu-MT 論文翻訳(概要): TimeROME-DLM: Temporal Causal Tracing and Low-Rank Inference-Time Knowledge Editing for Masked Diffusion Language Models

論文の概要: TimeROME-DLM: Temporal Causal Tracing and Low-Rank Inference-Time Knowledge Editing for Masked Diffusion Language Models

arxiv url: http://arxiv.org/abs/2606.12841v1
Date: Thu, 11 Jun 2026 03:09:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-12 15:55:27.556847
Title: TimeROME-DLM: Temporal Causal Tracing and Low-Rank Inference-Time Knowledge Editing for Masked Diffusion Language Models
Title（参考訳）: TimeROME-DLM:仮設拡散言語モデルのための時間的因果追跡と低ランク推論時間知識編集
Authors: Zhengtao Yao, Liuyang Song, Hongbo Zhang, Chenhao Wei, Haoyan Xu, Guang Yang, Siheng Wang,
Abstract要約: TimeROME-DLMはMDLMのためのトレーニングなし、勾配なし、推論時知識編集フレームワークである。 TimeROME-DLMは、計算コストのごく一部で、AR LLMとMDLMの配置と編集のギャップを埋める。
参考スコア（独自算出の注目度）: 7.556462661029354
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Masked diffusion language models (MDLMs) such as LLaDA now rival autoregressive (AR) LLMs, but every existing knowledge-editing and unlearning method (ROME, MEMIT, etc.) targets AR transformers and either makes assumptions that fail under iterative denoising, or requires gradient updates whose backward-pass activations cost tens of GB of extra VRAM and which collapse MDLMs at standard learning rates. We introduce TimeROME-DLM, the first training-free, gradient-free, inference-time knowledge-editing framework for MDLMs. It couples two components: a Temporal Indirect Effect (TIE) causal-tracing protocol that identifies, for each fact, the coordinate whose intervention most strongly drives the object prediction at later denoising steps; and a closed-form, low-rank residual edit memory that aggregates subject keys and target deltas across all forget facts and applies a single ridge-regularised update at that coordinate at every diffusion forward, with sparsification to limit utility spillover. Backbone weights stay frozen; only three hyperparameters (alpha, lambda, q) are tuned on a small validation split. On TOFU forget01 with TOFU-finetuned LLaDA-8B-Base, TimeROME-DLM cuts forget-set log-probability by roughly 83 nats. The same configuration transfers to LLaDA-8B-Instruct, Dream-7B, MMaDA-8B, DiffuLLaMA-7B, and LLaDA-MoE-1.4B. It keeps retain-set log-probability nearly flat (within ~1 nat at the utility-safe operating point) across 50 sequentially inserted facts, delivers a four- to fourteen-fold wall-clock speedup with zero additional VRAM over the strongest converged training-time baseline, and scales sub-linearly to 400 facts. TimeROME-DLM closes the locate-then-edit gap between AR LLMs and MDLMs at a fraction of the computational cost.
Abstract（参考訳）: LLaDAのような仮設拡散言語モデル(MDLM)は、現在では自己回帰(AR)のLLMと競合しているが、既存の知識編集および未学習(ROME、MEMITなど)メソッドはすべてARトランスフォーマーをターゲットにしており、反復的復調の下で失敗する仮定や、後方通過のアクティベーションが数十GB余分なVRAMを犠牲にしてMDLMを標準学習率で崩壊させる勾配更新を必要とする。 MDLMのためのトレーニングフリー、グラデーションフリー、推論時知識編集フレームワークであるTimeROME-DLMを紹介する。 TIE (Temporal Indirect Effect) 因果トラシングプロトコルは、それぞれの事実に対して、介入が後続のデノベーションステップでオブジェクト予測を最も強く駆動する座標を識別する。 3つのハイパーパラメータ(アルファ、ラムダ、q)だけが小さなバリデーション分割に基づいて調整される。 TOFU-finetuned LLaDA-8B-Base の forget01 では、TimeROME-DLM が約83ナットの誤りセットログ確率を削減している。同じ構成で、LLaDA-8B-Instruct、Dream-7B、MMaDA-8B、DiffuLLaMA-7B、LLaDA-MoE-1.4Bに転送される。シーケンシャルに挿入された50の事象に対して、保持セットのログプロビタビリティ(ユーティリティセーフな操作ポイントでは1ナット)をほぼフラットに保ち、最強の集中トレーニングタイムベースライン上で、VRAMをゼロに追加する4～14倍のウォールクロックスピードアップを提供し、400ファクトにサブラインでスケールする。 TimeROME-DLMは、計算コストのごく一部で、AR LLMとMDLMの配置と編集のギャップを埋める。

論文の概要: TimeROME-DLM: Temporal Causal Tracing and Low-Rank Inference-Time Knowledge Editing for Masked Diffusion Language Models

関連論文リスト