Fugu-MT 論文翻訳(概要): MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data

論文の概要: MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data

arxiv url: http://arxiv.org/abs/2603.08202v1
Date: Mon, 09 Mar 2026 10:29:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:15.794891
Title: MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data
Title（参考訳）: MM-TS:長距離データを用いたコントラスト学習のためのマルチモーダル温度とマージンスケジューリング
Authors: Siarhei Sheludzko, Dhimitrios Duka, Bernt Schiele, Hilde Kuehne, Anna Kukleva,
Abstract要約: マルチモーダル温度とマージンスケジューリング(MM-TS)を提案し、一様温度スケジューリングの概念をマルチモーダルコントラスト学習に拡張する。本手法はトレーニング中のコントラスト損失の温度を動的に調整し,マルチモーダル環境でのアトラクションと反発力を調節する。
参考スコア（独自算出の注目度）: 64.78447637450937
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Contrastive learning has become a fundamental approach in both uni-modal and multi-modal frameworks. This learning paradigm pulls positive pairs of samples closer while pushing negatives apart. In the uni-modal setting (e.g., image-based learning), previous research has shown that the strength of these forces can be controlled through the temperature parameter. In this work, we propose Multi-Modal Temperature and Margin Schedules (MM-TS), extending the concept of uni-modal temperature scheduling to multi-modal contrastive learning. Our method dynamically adjusts the temperature in the contrastive loss during training, modulating the attraction and repulsion forces in the multi-modal setting. Additionally, recognizing that standard multi-modal datasets often follow imbalanced, long-tail distributions, we adapt the temperature based on the local distribution of each training sample. Specifically, samples from dense clusters are assigned a higher temperature to better preserve their semantic structure. Furthermore, we demonstrate that temperature scheduling can be effectively integrated within a max-margin framework, thereby unifying the two predominant approaches in multi-modal contrastive learning: InfoNCE loss and max-margin objective. We evaluate our approach on four widely used image- and video-language datasets, Flickr30K, MSCOCO, EPIC-KITCHENS-100, and YouCook2, and show that our dynamic temperature and margin schedules improve performance and lead to new state-of-the-art results in the field.
Abstract（参考訳）: コントラスト学習は、ユニモーダルフレームワークとマルチモーダルフレームワークの両方において、基本的なアプローチとなっている。この学習パラダイムは、正のペアのサンプルを近づき、負を分解する。一様条件(例えば画像に基づく学習)において、過去の研究では、これらの力の強さは温度パラメータによって制御できることが示されている。本研究では,一様温度スケジューリングの概念をマルチモーダルコントラスト学習に拡張したマルチモーダル温度とマージンスケジューリング(MM-TS)を提案する。本手法はトレーニング中のコントラスト損失の温度を動的に調整し,マルチモーダル環境でのアトラクションと反発力を調節する。さらに、標準マルチモーダルデータセットは、しばしば不均衡で長い尾の分布に従うことを認識し、各トレーニングサンプルの局所分布に基づいて温度を適応する。具体的には、密度の高いクラスターからのサンプルは、その意味構造をよりよく保存するために高温に割り当てられる。さらに,マルチモーダルコントラスト学習における2つの主要なアプローチ – InfoNCE損失とmax-margin目標 – を統合することにより,温度スケジューリングを最大マージンフレームワークに効果的に統合できることを実証した。我々は,Flickr30K,MSCOCO,EPIC-KITCHENS-100,YouCook2の4つの画像およびビデオ言語データセットに対するアプローチを評価し,我々の動的温度とマージンスケジュールが性能を改善し,この分野における新たな最先端結果をもたらすことを示す。

論文の概要: MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data

関連論文リスト