Fugu-MT 論文翻訳(概要): AREA: Attribute Extraction and Aggregation for CLIP-Based Class-Incremental Learning

論文の概要: AREA: Attribute Extraction and Aggregation for CLIP-Based Class-Incremental Learning

arxiv url: http://arxiv.org/abs/2605.28809v1
Date: Wed, 27 May 2026 17:58:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:56.265218
Title: AREA: Attribute Extraction and Aggregation for CLIP-Based Class-Incremental Learning
Title（参考訳）: AREA:CLIPに基づくクラスインクリメンタル学習のための属性抽出と集約
Authors: Zhen-Hao Xie, Yu-Cheng Shi, Da-Wei Zhou,
Abstract要約: CIL(Class-Incremental Learning)は,現実世界の学習システム構築において重要である。 CLIにおける属性抽出とアグリゲーションのためのAREAを提案する。実験の結果、AREAはSOTA法よりも一貫して優れていた。
参考スコア（独自算出の注目度）: 17.715024506546957
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Class-Incremental Learning (CIL) is important in building real-world learning systems. In CLIP-based CIL, the model performs classification by comparing similarity between visual and textual embeddings obtained from template prompts, e.g., ``a photo of a [CLASS]''. This seemingly monolithic matching process can be decomposed into two conceptually distinct stages: attribute extraction and attribute aggregation. For example, a model may recognize cat using attributes such as fur texture and whiskers. When learning a new class like car, the model must extract additional attributes like wheels and adjust how they are aggregated in the shared representation space. However, since only data from the current task is available, incremental updates can bias both attribute extraction and aggregation toward new classes, leading to catastrophic forgetting. Therefore, we propose AREA for attribute extraction and aggregation in CLIP-based CIL. To stabilize extraction, we anchor class-level visual and textual attributes on the hyperspherical embedding space via principal geodesic analysis. To stabilize aggregation, we learn lightweight task-specific experts with scoring and residual refinement, regularized by a variational information bottleneck objective. During inference, we perform routing over task attribute manifolds via optimal transport for more concise prediction. Experiments show that AREA consistently outperforms SOTA methods. Code is available at https://github.com/LAMDA-CL/ICML2026-AREA.
Abstract（参考訳）: CIL(Class-Incremental Learning)は,現実世界の学習システム構築において重要である。 CLIPベースのCILでは、テンプレートプロンプトから得られる視覚とテキストの埋め込みの類似性(例えば、[CLASS]'の写真)を比較して分類を行う。この一見モノリシックなマッチングプロセスは、2つの概念的に異なる段階(属性抽出と属性アグリゲーション)に分解することができる。例えば、モデルが毛皮のテクスチャやウイスキーなどの属性を使って猫を認識できる。車のような新しいクラスを学ぶとき、モデルは車輪のような追加の属性を抽出し、共有表現空間でどのように集約されるかを調整する必要がある。しかし、現在のタスクからのデータのみが利用可能であるため、インクリメンタルアップデートは属性抽出とアグリゲーションの両方を新しいクラスに偏らせる可能性があるため、破滅的な忘れがちになる。そこで我々は,CLIPに基づくCILにおける属性抽出とアグリゲーションのためのAREAを提案する。抽出を安定させるために,主測地学的解析により,超球面埋め込み空間にクラスレベルの視覚的およびテキスト的属性を固定する。集約を安定させるために、情報ボトルネックの変動によって正規化され、スコアリングと残差改善により、軽量なタスク固有の専門家を学習する。推論中、より簡潔な予測のために最適な輸送によってタスク属性多様体上のルーティングを実行する。実験の結果、AREAはSOTA法よりも一貫して優れていた。コードはhttps://github.com/LAMDA-CL/ICML2026-AREAで入手できる。

論文の概要: AREA: Attribute Extraction and Aggregation for CLIP-Based Class-Incremental Learning

関連論文リスト