Fugu-MT 論文翻訳(概要): Not All Forgetting Is Equal: Architecture-Dependent Retention Dynamics in Fine-Tuned Image Classifiers

論文の概要: Not All Forgetting Is Equal: Architecture-Dependent Retention Dynamics in Fine-Tuned Image Classifiers

arxiv url: http://arxiv.org/abs/2604.11508v1
Date: Mon, 13 Apr 2026 14:11:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:16.590974
Title: Not All Forgetting Is Equal: Architecture-Dependent Retention Dynamics in Fine-Tuned Image Classifiers
Title（参考訳）: 美しい画像分類器におけるアーキテクチャ依存の保持ダイナミクス
Authors: Miit Daga, Swarna Priya Ramu,
Abstract要約: ResNet-18 と DeiT-Small の微調整中の各固有点におけるサンプル毎の正当性を追跡する。第5に、ヘッドウォームアップ後のサンプルの損失は、長期的な崩壊定数を予測する。アンサンブルにおけるアーキテクチャの多様性は、維持のカバレッジをもたらすことを示唆している。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fine-tuning pretrained image classifiers is standard practice, yet which individual samples are forgotten during this process, and whether forgetting patterns are stable or architecture dependent, remains unclear. Understanding these dynamics has direct implications for curriculum design, data pruning, and ensemble construction. We track per-sample correctness at every epoch during fine-tuning of ResNet-18 and DeiT-Small on a retinal OCT dataset (7 classes, 56:1 imbalance) and CUB-200-2011 (200 bird species), fitting Ebbinghaus-style exponential decay curves to each sample's retention trace. Five findings emerge. First, the two architectures forget fundamentally different samples: Jaccard overlap of the top 10 percent most-forgotten is 0.34 on OCTDL and 0.15 on CUB-200. Second, ViT forgetting is more structured (mean $R^2 = 0.74$) than CNN forgetting ($R^2 = 0.52$). Third, per-sample forgetting is stochastic across random seeds (Spearman $ρ\approx 0.01$), challenging the assumption that sample difficulty is an intrinsic property. Fourth, class-level forgetting is consistent and semantically interpretable: visually similar species are forgotten most, distinctive ones least. Fifth, a sample's loss after head warmup predicts its long-term decay constant ($ρ= 0.30$ to $0.50$, $p < 10^{-45}$). These findings suggest that architectural diversity in ensembles provides complementary retention coverage, and that curriculum or pruning methods based on per-sample difficulty may not generalize across runs. A spaced repetition sampler built on these decay constants does not outperform random sampling, indicating that static scheduling cannot exploit unstable per-sample signals.
Abstract（参考訳）: 微調整済み画像分類器は標準的な慣行であるが、このプロセス中に個々のサンプルが忘れられ、パターンを忘れることが安定であるかアーキテクチャに依存しているかは定かではない。これらのダイナミクスを理解することは、カリキュラム設計、データプルーニング、アンサンブル構築に直接的な意味を持つ。網膜OCTデータセット(7クラス56:1不均衡)とCUB-200-2011(200種鳥種)でResNet-18とDeiT-Smallを微調整し,エビングハウス型指数崩壊曲線を各試料の保持痕跡に適合させる。 5つの発見がある。まず、2つのアーキテクチャは基本的に異なるサンプルを忘れている: もっとも忘れられたトップ10%のジャカードのオーバーラップはOCTDLで0.34、CUB-200で0.15である。第二に、ViTの忘れ物はCNNの忘れ物(R^2 = 0.52$)よりも構造化されている(平均$R^2 = 0.74$)。第三に、サンプルごとの忘れ物はランダムな種間で確率的であり(Spearman $ρ\approx 0.01$)、サンプルの難易度が本質的な性質であるという仮定に挑戦する。第4に、クラスレベルの忘れは一貫性があり、意味論的に解釈可能である。第5に、ヘッドウォームアップ後のサンプルの損失は、長期的な崩壊定数(ρ= 0.30$ to $0.50$, $p < 10^{-45}$)を予測する。これらの結果から,アンサンブルのアーキテクチャの多様性は相補的保持範囲を提供し,アンサンブルごとの難易度に基づくカリキュラムやプルーニング手法は実行中に一般化しない可能性が示唆された。これらの減衰定数上に構築された空間的繰り返しサンプリングは、ランダムサンプリングよりも優れておらず、静的スケジューリングが不安定なサンプルごとの信号を利用することができないことを示している。

論文の概要: Not All Forgetting Is Equal: Architecture-Dependent Retention Dynamics in Fine-Tuned Image Classifiers

関連論文リスト