Fugu-MT 論文翻訳(概要): Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

論文の概要: Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

arxiv url: http://arxiv.org/abs/2605.09608v1
Date: Sun, 10 May 2026 15:40:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 02:24:05.536875
Title: Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training
Title（参考訳）: 幾何学的矛盾: LLM連続試験における予測の説明と制御
Authors: Yuanyi Wang, Yifan Yang, Su Lu, Yanggan Gu, Pengkai Wang, Wenjun Wang, Zhaoyi Yan, Congkai Xie, Jianmin Wu, Jialun Cao, Shing-Chi Cheung, Hongxia Yang,
Abstract要約: 我々は3つの質問を通して連続的なポストトレーニングについて研究している。私たちの中心的な発見は、忘れることがステートリレーショナルな更新統合の失敗と見なすことができることです。データフリー更新積分法であるGeometry-Conflict Wasserstein Merging (GCWM)を提案する。
参考スコア（独自算出の注目度）: 32.954434663995
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Continual post-training aims to extend large language models (LLMs) with new knowledge, skills, and behaviors, yet it remains unclear when sequential updates enable capability transfer and when they cause catastrophic forgetting. Existing methods mitigate forgetting through sequential fine-tuning, replay, regularization, or model merging, but offer limited criteria for determining when incorporating new updates is beneficial or harmful. In this work, we study LLM continual post-training through three questions: What drives forgetting? When do sequentially acquired capabilities transfer or interfere? How can compatibility be used to control update integration? We address these questions through task geometry: we represent each post-training task by its parameter update and study the covariance geometry induced by the update. Our central finding is that: forgetting can be considered as a state-relative update-integration failure, it arises when the covariance geometries induced by tasks misalign with the geometry of the evolving model state. Sequential updates transfer when they remain compatible with the model state shaped by previous updates, and interfere when state-relative geometry conflict becomes high. Motivated by this finding, we propose Geometry-Conflict Wasserstein Merging (GCWM), a data-free update-integration method that constructs a shared Wasserstein metric via Gaussian Wasserstein barycenters and uses geometry conflict to gate geometry-aware correction. Across Qwen3 0.6B--14B on domain-continual and capability-continual settings, GCWM consistently outperforms data-free baselines, improving retention and final performance without replay data. These results identify geometry conflict as both an explanatory signal for forgetting and a practical control signal for LLM continual post-training.
Abstract（参考訳）: 継続的なポストトレーニングは、大きな言語モデル(LLM)を新しい知識、スキル、振る舞いで拡張することを目的としている。既存の方法は、シーケンシャルな微調整、リプレイ、正規化、モデルマージを通じて忘れを緩和するが、新しい更新を組み込むことが有益か有害かを決定するための限られた基準を提供する。本研究では,LLMの継続学習を3つの質問を通して研究する。シーケンシャルに取得した機能はいつ転送されるのか? 互換性はどのようにして更新統合を制御することができるのか? 我々は,これらの課題をタスク幾何学を通して解決し,そのパラメータ更新によって各ポストトレーニングタスクを表現し,その更新によって引き起こされる共分散幾何学を研究する。我々の中心的な発見は: 忘れることは状態関係の更新積分失敗と見なすことができ、それは、進化するモデル状態の幾何学と相違するタスクによって誘導される共分散幾何学が生じるときに起こる。逐次更新は、以前の更新によって形成されたモデル状態と互換性を維持したままであり、状態相対幾何学的衝突が高くなると干渉する。この発見に触発されたGeometry-Conflict Wasserstein Merging (GCWM) は,ガウス・ヴァッサーシュタイン・バリセンツを介して共有ワッサーシュタイン計量を構築し,幾何学的競合を利用してゲート幾何学的補正を行うデータフリーな更新積分法である。 Qwen3 0.6B--14Bのドメイン・コンティネント・コンティネント・セッティングでは、GCWMはデータフリーのベースラインを一貫して上回り、データ再生なしで保持と最終的なパフォーマンスを改善している。これらの結果は、幾何学的矛盾を、忘れるための説明的信号と、LLM連続後学習のための実用的な制御信号の両方とみなす。

論文の概要: Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

関連論文リスト