Fugu-MT 論文翻訳(概要): ECA: Efficient Continual Alignment for Open-Ended Image-to-Text Generation

論文の概要: ECA: Efficient Continual Alignment for Open-Ended Image-to-Text Generation

arxiv url: http://arxiv.org/abs/2606.12633v1
Date: Wed, 10 Jun 2026 19:42:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-12 15:55:27.433584
Title: ECA: Efficient Continual Alignment for Open-Ended Image-to-Text Generation
Title（参考訳）: ECA: オープンエンディング画像-テキスト生成のための効率的な連続アライメント
Authors: Jiangtao Kong, Peijun Zhao, Chun-Fu Chen, Youngwook Do, Shaohan Hu, Tianyi Zhou, Huajie Shao,
Abstract要約: OpenITG (Incrmental Learning for Open-ended Image-to-Text Generation) は、新しい画像に対して正確で文脈的に関連するテキストを連続的に生成することを可能にする。本稿では、環境が進化するにつれて、視覚データの主要なカテゴリが時間とともに変化するという、より実践的なシナリオに対処する。本稿では, 連続的アライメントの概念を導入し, 事前学習されたVLM内のアライメントモジュールを漸進的に適応させて, 高品質なモダル表現を保存する。
参考スコア（独自算出の注目度）: 22.537188820123962
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Incremental Learning (IL) for Open-ended Image-to-Text Generation (OpenITG) enables models to continuously generate accurate, contextually relevant text for new images while preserving previously acquired knowledge. Unlike prior studies, this paper addresses a more practical scenario in which the predominant category of visual data shifts over time as environments evolve. In this context, we introduce a new notion of continual alignment, which incrementally adapts the alignment module within pre-trained VLMs to preserve high-quality cross-modal representations. Based on this idea, we propose Efficient Continual Alignment (ECA), a novel exemplar-free IL approach for OpenITG. The key challenge is enabling the model to acquire new, task-specific features while minimizing interference with the established alignment without accessing raw data from previous tasks. To address this, ECA employs three core mechanisms: a Mixture of Query (MoQ) module that adapts task-specific query tokens, a Fisher Dynamic Expansion (FeDEx) that dynamically expands model structure based on a Fisher Information Matrix (FIM)-based metric, and an embedding dictionary with Dictionary Replay (DR) to retain past knowledge. To evaluate ECA's performance, we construct four new IL OpenITG benchmarks that better reflect real-world scenarios. Experimental results demonstrate that ECA significantly mitigates catastrophic forgetting and improves IL performance compared to baseline methods. Code and benchmarks are available at https://github.com/Snowball0823/ECA.
Abstract（参考訳）: OpenITG(Open-ended Image-to-Text Generation)のためのインクリメンタルラーニング(IL)により、モデルが獲得した知識を保ちながら、新しい画像に対する正確で文脈的に関連するテキストを連続的に生成することができる。従来の研究とは異なり、環境が進化するにつれて視覚データの主要なカテゴリが時間とともに変化するという、より実践的なシナリオに対処する。この文脈では、高品質なクロスモーダル表現を維持するために、事前訓練されたVLM内のアライメントモジュールを漸進的に適応させる、連続アライメントという新しい概念を導入する。提案手法は,OpenITGのための新しい非標準ILアプローチであるEfficient Continual Alignment (ECA)を提案する。重要な課題は、モデルが以前のタスクから生データにアクセスすることなく、確立したアライメントとの干渉を最小限にしつつ、新しいタスク固有の機能を取得することを可能にすることである。タスク固有のクエリトークンを適応するMixture of Query(MoQ)モジュール、Fisher Information Matrix(FIM)ベースのメトリックに基づいてモデル構造を動的に拡張するFisher Dynamic Expansion(FeDEx)、過去の知識を保持するためにDictionary Replay(DR)を備えた埋め込み辞書である。 ECAの性能を評価するために,実世界のシナリオをよりよく反映した4つの新しいIL OpenITGベンチマークを構築した。実験により,ECAは破滅的忘れ込みを著しく軽減し,ベースライン法と比較してIL性能を向上させることが示された。コードとベンチマークはhttps://github.com/Snowball0823/ECAで公開されている。

論文の概要: ECA: Efficient Continual Alignment for Open-Ended Image-to-Text Generation

関連論文リスト