Fugu-MT 論文翻訳(概要): JourneyDB: A Benchmark for Generative Image Understanding

論文の概要: JourneyDB: A Benchmark for Generative Image Understanding

arxiv url: http://arxiv.org/abs/2307.00716v2
Date: Sat, 28 Oct 2023 11:46:07 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-31 21:02:58.538892
Title: JourneyDB: A Benchmark for Generative Image Understanding
Title（参考訳）: JourneyDB: 生成イメージ理解のためのベンチマーク
Authors: Keqiang Sun, Junting Pan, Yuying Ge, Hao Li, Haodong Duan, Xiaoshi Wu, Renrui Zhang, Aojun Zhou, Zipeng Qin, Yi Wang, Jifeng Dai, Yu Qiao, Limin Wang, Hongsheng Li
Abstract要約: 生成画像の領域に適合する包括的データセットであるJourneyDBを導入する。精巧にキュレートされたデータセットは、400万の異なる高品質な画像で構成されています。本データセットでは,生成した画像の理解性能を評価するための4つのベンチマークを考案した。
参考スコア（独自算出の注目度）: 89.02046606392382
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While recent advancements in vision-language models have had a transformative impact on multi-modal comprehension, the extent to which these models possess the ability to comprehend generated images remains uncertain. Synthetic images, in comparison to real data, encompass a higher level of diversity in terms of both content and style, thereby presenting significant challenges for the models to fully grasp. In light of this challenge, we introduce a comprehensive dataset, referred to as JourneyDB, that caters to the domain of generative images within the context of multi-modal visual understanding. Our meticulously curated dataset comprises 4 million distinct and high-quality generated images, each paired with the corresponding text prompts that were employed in their creation. Furthermore, we additionally introduce an external subset with results of another 22 text-to-image generative models, which makes JourneyDB a comprehensive benchmark for evaluating the comprehension of generated images. On our dataset, we have devised four benchmarks to assess the performance of generated image comprehension in relation to both content and style interpretation. These benchmarks encompass prompt inversion, style retrieval, image captioning, and visual question answering. Lastly, we evaluate the performance of state-of-the-art multi-modal models when applied to the JourneyDB dataset, providing a comprehensive analysis of their strengths and limitations in comprehending generated content. We anticipate that the proposed dataset and benchmarks will facilitate further research in the field of generative content understanding. The dataset is publicly available at https://journeydb.github.io.
Abstract（参考訳）: 近年の視覚言語モデルの進歩はマルチモーダル理解に変化をもたらしたが、これらのモデルが生成した画像を理解する能力を持っている範囲は未だ不明である。合成画像は、実際のデータと比較して、コンテンツとスタイルの両方において高いレベルの多様性を包含するので、モデルが完全に把握する上で大きな課題となる。この課題を踏まえて,多モード視覚理解の文脈における生成画像の領域に対応する,journeydbと呼ばれる包括的データセットを導入する。我々の微妙にキュレートされたデータセットは、400万の異なる高品質な生成画像で構成され、それぞれが作成に使用された対応するテキストプロンプトとペアリングされる。さらに、新たに22のテキスト・ツー・イメージ生成モデルを用いた外部サブセットを導入することで、JourneyDBは生成された画像の理解を評価するための総合的なベンチマークとなる。本稿のデータセットでは,コンテントとスタイル解釈の両面で生成画像理解の性能を評価するために,4つのベンチマークを考案した。これらのベンチマークには、インバージョン、スタイル検索、画像キャプション、視覚的質問応答が含まれる。最後に、journeydbデータセットに適用した場合、最先端のマルチモーダルモデルのパフォーマンスを評価し、生成されたコンテンツの理解における強みと制限を包括的に分析する。提案したデータセットとベンチマークは、生成コンテンツ理解の分野におけるさらなる研究を促進することを期待する。データセットはhttps://journeydb.github.ioで公開されている。

論文の概要: JourneyDB: A Benchmark for Generative Image Understanding

関連論文リスト