Fugu-MT 論文翻訳(概要): Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention

論文の概要: Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention

arxiv url: http://arxiv.org/abs/2504.02496v1
Date: Thu, 03 Apr 2025 11:19:51 GMT
ステータス: 翻訳完了
システム内更新日: 2025-04-11 17:33:11.299945
Title: Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Title（参考訳）: メモリ差分符号化とアテンションを考慮したグループベース識別画像キャプション
Authors: Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan,
Abstract要約: グループベース微分差分キャプション法 Group-based Differential Memory Attention (GDMA)モジュール。新しい評価指標DisWordRate
参考スコア（独自算出の注目度）: 62.246950834745796
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Recent advances in image captioning have focused on enhancing accuracy by substantially increasing the dataset and model size. While conventional captioning models exhibit high performance on established metrics such as BLEU, CIDEr, and SPICE, the capability of captions to distinguish the target image from other similar images is under-explored. To generate distinctive captions, a few pioneers employed contrastive learning or re-weighted the ground-truth captions. However, these approaches often overlook the relationships among objects in a similar image group (e.g., items or properties within the same album or fine-grained events). In this paper, we introduce a novel approach to enhance the distinctiveness of image captions, namely Group-based Differential Distinctive Captioning Method, which visually compares each image with other images in one similar group and highlights the uniqueness of each image. In particular, we introduce a Group-based Differential Memory Attention (GDMA) module, designed to identify and emphasize object features in an image that are uniquely distinguishable within its image group, i.e., those exhibiting low similarity with objects in other images. This mechanism ensures that such unique object features are prioritized during caption generation for the image, thereby enhancing the distinctiveness of the resulting captions. To further refine this process, we select distinctive words from the ground-truth captions to guide both the language decoder and the GDMA module. Additionally, we propose a new evaluation metric, the Distinctive Word Rate (DisWordRate), to quantitatively assess caption distinctiveness. Quantitative results indicate that the proposed method significantly improves the distinctiveness of several baseline models, and achieves state-of-the-art performance on distinctiveness while not excessively sacrificing accuracy...
Abstract（参考訳）: 画像キャプションの最近の進歩は、データセットとモデルサイズを大幅に増やすことによる精度の向上に焦点を当てている。従来のキャプションモデルは,BLEU,CIDEr,SPICEなどの確立した指標に対して高い性能を示すが,他の類似画像と区別するキャプションの能力は乏しい。独特なキャプションを生成するために、少数の開拓者は対照的な学習を取り入れたり、基幹のキャプションを再重み付けしたりした。しかしながら、これらのアプローチは、同じイメージグループ(例えば、同じアルバム内のアイテムやプロパティやきめ細かいイベント)内のオブジェクト間の関係をしばしば見落としている。本稿では,画像キャプションの特異性,すなわち,画像と他の画像とを視覚的に比較し,画像の特異性を強調するグループ型差分的キャプション手法を提案する。特に,GDMA (Group-based Differential Memory Attention) モジュールを導入し,画像中のオブジェクトの特徴を識別・強調する。この機構により、画像のキャプション生成中にそのようなユニークなオブジェクトの特徴が優先され、その結果のキャプションの特異性を高めることができる。この処理をさらに洗練するために、言語デコーダとGDMAモジュールの両方をガイドするために、接頭辞から特徴語を選択する。さらに,印象的単語レート(DisWordRate, Distinctive Word Rate, DisWordRate, Distinctive Word Rate, DisWordRate)を定量的に評価する指標を提案する。定量的結果から,提案手法はいくつかのベースラインモデルの特異性を著しく改善し,過度に精度を犠牲にすることなく,その特異性に対する最先端性能を実現することが示唆された。

論文の概要: Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention

関連論文リスト