Fugu-MT 論文翻訳(概要): $φ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models

論文の概要: $φ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models

arxiv url: http://arxiv.org/abs/2602.22601v1
Date: Thu, 26 Feb 2026 04:14:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-27 18:41:22.521515
Title: $φ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models
Title（参考訳）: $φ$-DPO:Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models
Authors: Thanh-Dat Truong, Huu-Thien Tran, Jackson Cothren, Bhiksha Raj, Khoa Luu,
Abstract要約: 本稿では,LMMにおける連続学習のためのFairness Direct Preference Optimization (FaiDPO, $-DPO) フレームワークを提案する。まず,直接選好最適化(DPO)に基づく新たな連続学習パラダイムを提案する。大規模な実験とアブレーション研究は、提案された$-DPOが複数のベンチマークでステート・オブ・ザ・アートのパフォーマンスを達成することを示している。
参考スコア（独自算出の注目度）: 58.217707070069885
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fairness in Continual Learning for Large Multimodal Models (LMMs) is an emerging yet underexplored challenge, particularly in the presence of imbalanced data distributions that can lead to biased model updates and suboptimal performance across tasks. While recent continual learning studies have made progress in addressing catastrophic forgetting, the problem of fairness caused the imbalanced data remains largely underexplored. This paper presents a novel Fairness Direct Preference Optimization (FaiDPO or $φ$-DPO) framework for continual learning in LMMs. In particular, we first propose a new continual learning paradigm based on Direct Preference Optimization (DPO) to mitigate catastrophic forgetting by aligning learning with pairwise preference signals. Then, we identify the limitations of conventional DPO in imbalanced data and present a new $φ$-DPO loss that explicitly addresses distributional biases. We provide a comprehensive theoretical analysis demonstrating that our approach addresses both forgetting and data imbalance. Additionally, to enable $φ$-DPO-based continual learning, we construct pairwise preference annotations for existing benchmarks in the context of continual learning. Extensive experiments and ablation studies show the proposed $φ$-DPO achieves State-of-the-Art performance across multiple benchmarks, outperforming prior continual learning methods of LMMs.
Abstract（参考訳）: LMM(Continuous Learning for Large Multimodal Models)の公正性(Fairness in Continual Learning for Large Multimodal Models)は、特に不均衡なデータ分散の存在が、タスク間のバイアス付きモデル更新やサブ最適パフォーマンスにつながる、という未解決の課題である。近年の継続的な学習研究は、破滅的な忘れ事に対処する進歩を遂げているが、不均衡なデータの原因となった公平性の問題は、大半が未発見のままである。本稿では,LMMにおける連続学習のためのFairness Direct Preference Optimization (FaiDPO, $φ$-DPO) フレームワークを提案する。特に,まず直接選好最適化(DPO)に基づく新たな連続学習パラダイムを提案する。そして,不均衡なデータにおける従来のDPOの限界を特定し,分布バイアスに明示的に対処する新たな$φ$-DPO損失を示す。提案手法は, 忘れとデータの不均衡の両方に対処するものであることを示す。さらに、$φ$-DPOベースの継続学習を可能にするために、継続学習の文脈において、既存のベンチマークに対してペアワイズ優先アノテーションを構築する。大規模な実験とアブレーション研究により、提案された$φ$-DPOは、複数のベンチマークでステート・オブ・ザ・アートのパフォーマンスを達成し、LMMの先行的な学習方法よりも優れていた。

論文の概要: $φ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models

関連論文リスト