Fugu-MT 論文翻訳(概要): Adaptive Offline Quintuplet Loss for Image-Text Matching

論文の概要: Adaptive Offline Quintuplet Loss for Image-Text Matching

arxiv url: http://arxiv.org/abs/2003.03669v3
Date: Wed, 22 Jul 2020 14:58:18 GMT
ステータス: 翻訳完了
システム内更新日: 2022-12-25 19:58:25.786415
Title: Adaptive Offline Quintuplet Loss for Image-Text Matching
Title（参考訳）: 画像テキストマッチングのための適応型オフラインクインタプレット損失
Authors: Tianlang Chen, Jiajun Deng, Jiebo Luo
Abstract要約: 既存の画像テキストマッチングアプローチでは、オンラインのハードネガティブによるトリプルト損失を利用してモデルをトレーニングするのが一般的である。トレーニングセット全体からオフラインで負をサンプリングして解を提案する。我々は,MS-COCOとFlickr30Kデータセットを用いた3つの最先端画像テキストモデルに対するトレーニング手法の評価を行った。
参考スコア（独自算出の注目度）: 102.50814151323965
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing image-text matching approaches typically leverage triplet loss with online hard negatives to train the model. For each image or text anchor in a training mini-batch, the model is trained to distinguish between a positive and the most confusing negative of the anchor mined from the mini-batch (i.e. online hard negative). This strategy improves the model's capacity to discover fine-grained correspondences and non-correspondences between image and text inputs. However, the above approach has the following drawbacks: (1) the negative selection strategy still provides limited chances for the model to learn from very hard-to-distinguish cases. (2) The trained model has weak generalization capability from the training set to the testing set. (3) The penalty lacks hierarchy and adaptiveness for hard negatives with different "hardness" degrees. In this paper, we propose solutions by sampling negatives offline from the whole training set. It provides "harder" offline negatives than online hard negatives for the model to distinguish. Based on the offline hard negatives, a quintuplet loss is proposed to improve the model's generalization capability to distinguish positives and negatives. In addition, a novel loss function that combines the knowledge of positives, offline hard negatives and online hard negatives is created. It leverages offline hard negatives as the intermediary to adaptively penalize them based on their distance relations to the anchor. We evaluate the proposed training approach on three state-of-the-art image-text models on the MS-COCO and Flickr30K datasets. Significant performance improvements are observed for all the models, proving the effectiveness and generality of our approach. Code is available at https://github.com/sunnychencool/AOQ
Abstract（参考訳）: 既存の画像テキストマッチングアプローチは、通常、オンラインのハードネガティブによるトリプルト損失を利用してモデルをトレーニングする。トレーニング用ミニバッチにおける各画像またはテキストアンカーに対して、モデルは、ミニバッチから抽出されたアンカーの正と最も紛らわしい負の区別(オンラインハードネガティブ)を訓練する。この戦略により、画像とテキストの入力間の微粒な対応や非対応を見つける能力が向上する。しかし、上記の手法には次のような欠点がある: 1) 負の選択戦略は、非常に難しいケースからモデルを学習する限られた機会を提供する。 2) トレーニングモデルでは, トレーニングセットからテストセットまで, 弱い一般化能力を有する。 3) ペナルティは「硬さ」の度合いが異なる硬い負の階層と適応性に欠ける。本稿では,トレーニングセット全体からオフラインで負をサンプリングする手法を提案する。モデルは、オンラインのハードネガティブよりも"ハード"なオフラインネガティブを提供する。オフラインのハードネガティブに基づいて、正と負を区別する一般化能力を改善するために、クインタップレット損失を提案する。また、ポジティブ、オフラインのハードネガティブ、オンラインのハードネガティブの知識を組み合わせた新しいロス関数が作成される。オフラインのハードネガティブを仲介者として利用し、アンカーとの距離関係に基づいて適応的にペナル化する。我々は,MS-COCOとFlickr30Kデータセットを用いた3つの最先端画像テキストモデルに対するトレーニング手法の評価を行った。全てのモデルにおいて重要な性能改善が観察され、我々のアプローチの有効性と一般化が証明された。コードはhttps://github.com/sunnychencool/AOQで入手できる。

関連論文リスト

ReNeg: Learning Negative Embedding with Reward Guidance [69.81219455975477]
テキスト・ツー・イメージ(T2I)生成アプリケーションでは、負の埋め込みは生成品質を向上させるための単純で効果的なアプローチであることが証明されている。 Rewardモデルにより導かれる改良された負の埋め込みを学習するために設計されたエンドツーエンドの手法であるReNegを紹介する。
論文参考訳（メタデータ） (2024-12-27T13:31:55Z)
Conan-embedding: General Text Embedding with More and Better Negative Samples [30.571206231457932]
より高品質な負例の利用を最大化するコナン埋め込みモデルを提案する。当社のアプローチは,現在,Massiveテキスト埋め込みベンチマークの中国リーダーボードにランクインしている,埋め込みモデルの能力を効果的に向上させる。
論文参考訳（メタデータ） (2024-08-28T11:18:06Z)
Active Mining Sample Pair Semantics for Image-text Matching [6.370886833310617]
本稿では,Active Mining Sample Pair Semantics Image-text matching model (AMSPS)と呼ばれる新しい画像テキストマッチングモデルを提案する。 3重項損失関数を持つコモンセンス学習モデルの1つの意味学習モードと比較して、AMSPSはアクティブな学習アイデアである。
論文参考訳（メタデータ） (2023-11-09T15:03:57Z)
Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining [58.379339799777064]
大規模視覚言語モデル(VLM)は、強力な表現能力を示し、画像およびテキスト理解タスクを強化するためにユビキタスである。両方向のマイニングだけでなく,両方向の否定的なサンプルを生成するフレームワークを提案する。私たちのコードとデータセットはhttps://ugorsahin.github.io/enhancing-multimodal-compositional-reasoning-of-vlm.htmlで公開されています。
論文参考訳（メタデータ） (2023-11-07T13:05:47Z)
Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination [62.18768931714238]
提案手法は, サンプリングによる新規な偽陰性除去 (FNE) 戦略である。その結果,提案した偽陰性除去戦略の優位性が示された。
論文参考訳（メタデータ） (2023-08-08T16:31:43Z)
Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval [19.161248757493386]
本稿では, 否定文として自動的に合成文を生成するために, 識別・訂正による否定文をTAGS-DC(TAiloring Negative Sentences with Discrimination and Correction)を提案する。トレーニング中の難易度を維持するため,パラメータ共有による検索と生成を相互に改善する。実験では,MS-COCOおよびFlickr30Kにおけるモデルの有効性を,現在の最先端モデルと比較して検証した。
論文参考訳（メタデータ） (2021-11-05T09:36:41Z)
Mixing between the Cross Entropy and the Expectation Loss Terms [89.30385901335323]
クロスエントロピー損失は、トレーニング中にサンプルを分類するのが難しくなる傾向にある。最適化目標に期待損失を加えることで,ネットワークの精度が向上することを示す。実験により,新しいトレーニングプロトコルにより,多様な分類領域における性能が向上することが示された。
論文参考訳（メタデータ） (2021-09-12T23:14:06Z)
Self-Damaging Contrastive Learning [92.34124578823977]
ラベルのないデータは一般に不均衡であり、長い尾の分布を示す。本稿では,クラスを知らずに表現学習を自動的にバランスをとるための,自己学習コントラスト学習という原則的枠組みを提案する。実験の結果,SDCLRは全体としての精度だけでなく,バランス性も著しく向上することがわかった。
論文参考訳（メタデータ） (2021-06-06T00:04:49Z)
Contrastive Learning with Hard Negative Samples [80.12117639845678]
我々は, 厳密な陰性サンプルを選択するために, 教師なしサンプリング手法を新たに開発する。このサンプリングの制限ケースは、各クラスをしっかりとクラスタ化し、可能な限り異なるクラスを遠くにプッシュする表現をもたらす。提案手法は、複数のモードをまたいだダウンストリーム性能を改善し、実装するコード行数が少なく、計算オーバーヘッドを伴わない。
論文参考訳（メタデータ） (2020-10-09T14:18:53Z)
SCE: Scalable Network Embedding from Sparsest Cut [20.08464038805681]
大規模ネットワーク埋め込みは、教師なしの方法で各ノードの潜在表現を学習することである。このような対照的な学習手法の成功の鍵は、正と負のサンプルを引き出す方法である。本稿では, 負のサンプルのみを用いた教師なしネットワーク埋め込みのためのSCEを提案する。
論文参考訳（メタデータ） (2020-06-30T03:18:15Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。