Fugu-MT 論文翻訳(概要): On Distinctive Image Captioning via Comparing and Reweighting

論文の概要: On Distinctive Image Captioning via Comparing and Reweighting

arxiv url: http://arxiv.org/abs/2204.03938v1
Date: Fri, 8 Apr 2022 08:59:23 GMT
ステータス: 翻訳完了
システム内更新日: 2022-04-11 12:19:10.781132
Title: On Distinctive Image Captioning via Comparing and Reweighting
Title（参考訳）: 比較・重み付けによる特徴的画像キャプションについて
Authors: Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan
Abstract要約: 本稿では,類似画像の集合との比較と再重み付けにより,画像キャプションの特異性を向上させることを目的とする。 MSCOCOデータセットの各画像の人間のアノテーションは、特徴性に基づいて等価ではないことが明らかとなった。対照的に、以前の研究は通常、トレーニング中に人間のアノテーションを平等に扱う。
参考スコア（独自算出の注目度）: 52.3731631461383
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent image captioning models are achieving impressive results based on popular metrics, i.e., BLEU, CIDEr, and SPICE. However, focusing on the most popular metrics that only consider the overlap between the generated captions and human annotation could result in using common words and phrases, which lacks distinctiveness, i.e., many similar images have the same caption. In this paper, we aim to improve the distinctiveness of image captions via comparing and reweighting with a set of similar images. First, we propose a distinctiveness metric -- between-set CIDEr (CIDErBtw) to evaluate the distinctiveness of a caption with respect to those of similar images. Our metric reveals that the human annotations of each image in the MSCOCO dataset are not equivalent based on distinctiveness; however, previous works normally treat the human annotations equally during training, which could be a reason for generating less distinctive captions. In contrast, we reweight each ground-truth caption according to its distinctiveness during training. We further integrate a long-tailed weight strategy to highlight the rare words that contain more information, and captions from the similar image set are sampled as negative examples to encourage the generated sentence to be unique. Finally, extensive experiments are conducted, showing that our proposed approach significantly improves both distinctiveness (as measured by CIDErBtw and retrieval metrics) and accuracy (e.g., as measured by CIDEr) for a wide variety of image captioning baselines. These results are further confirmed through a user study.
Abstract（参考訳）: 最近の画像キャプションモデルは、BLEU、CIDEr、SPICEといった一般的な指標に基づいて印象的な結果を得ている。しかし、生成したキャプションと人間の注釈の重なりしか考慮しない最も一般的な指標に注目すると、共通する単語やフレーズが使用され、その特徴性に欠ける、すなわち、類似した画像の多くが同じキャプションを持つ。本稿では,類似画像の集合との比較と再重み付けにより,画像キャプションの特異性を向上させることを目的とする。まず,類似画像に対する字幕の識別性を評価するために,セットcider(ciderbtw)間の識別性指標を提案する。 MSCOCOデータセットの各画像の人間のアノテーションは、特徴性に基づいて等価ではないことが明らかとなったが、従来の研究では、トレーニング中に人間のアノテーションを等しく扱うことが特徴的でないキャプションを生成する理由となり得る。対照的に、トレーニング中の特徴に応じて、各接頭辞を重み付けする。さらに,より詳細な情報を含む希少な単語を強調するために長尾重み戦略を取り入れ,類似画像集合からのキャプションを負の例としてサンプリングし,生成文の独特化を促す。最後に,提案手法は,CIDErBtwで測定した特徴量と,CIDErで測定した精度(例えば,CIDErで測定した精度)を,多種多様な画像キャプションベースラインに対して有意に改善することを示す。これらの結果はユーザ調査によってさらに確認される。

論文の概要: On Distinctive Image Captioning via Comparing and Reweighting

関連論文リスト