Fugu-MT 論文翻訳(概要): From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change on Social Media

論文の概要: From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change on Social Media

arxiv url: http://arxiv.org/abs/2604.21786v1
Date: Thu, 23 Apr 2026 15:44:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:06.694188
Title: From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change on Social Media
Title（参考訳）: コードブックからVLMへ:ソーシャルメディアにおける気候変化のための自動視覚談話分析の評価
Authors: Katharina Prasse, Steffen Jung, Isaac Bravo, Stefanie Walter, Patrick Knab, Christian Bartelt, Margret Keuper,
Abstract要約: 我々は,ソーシャルメディアの談話分析にコンピュータビジョン手法をどのように利用できるかを分析し,そのような研究を促進することを目的としている。この分析には、アプリケーションベースの分類設計、モデル選択、迅速なエンジニアリング、検証が含まれる。我々は、X(旧Twitter)の2つのデータセットに対して、6つの即発的な視覚言語モデルと15のゼロショットCLIPのようなモデルをベンチマークした。
参考スコア（独自算出の注目度）: 22.261744577934554
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Social media platforms have become primary arenas for climate communication, generating millions of images and posts that - if systematically analysed - can reveal which communication strategies mobilise public concern and which fall flat. We aim to facilitate such research by analysing how computer vision methods can be used for social media discourse analysis. This analysis includes application-based taxonomy design, model selection, prompt engineering, and validation. We benchmark six promptable vision-language models and 15 zero-shot CLIP-like models on two datasets from X (formerly Twitter) - a 1,038-image expert-annotated set and a larger corpus of over 1.2 million images, with 50,000 labels manually validated - spanning five annotation dimensions: animal content, climate change consequences, climate action, image setting, and image type. Among the models benchmarked, Gemini-3.1-flash-lite outperforms all others across all super-categories and both datasets, while the gap to open-weight models of moderate size remains relatively small. Beyond instance-level metrics, we advocate for distributional evaluation: VLM predictions can reliably recover population level trends even when per-image accuracy is moderate, making them a viable starting point for discourse analysis at scale. We find that chain-of-thought reasoning reduces rather than improves performance, and that annotation dimension specific prompt design improves performance. We release tweet IDs and labels along with our code at https://github.com/KathPra/Codebooks2VLMs.git.
Abstract（参考訳）: ソーシャルメディアのプラットフォームは、何百万もの画像やポストを生成し、体系的に分析すれば、どのコミュニケーション戦略が公共の関心を動員し、どれが平らになるかを明らかにすることができる。我々は,ソーシャルメディアの談話分析にコンピュータビジョン手法をどのように利用できるかを分析し,そのような研究を促進することを目的としている。この分析には、アプリケーションベースの分類設計、モデル選択、迅速なエンジニアリング、検証が含まれる。 X(旧Twitter)の2つのデータセット — 1,038イメージのエキスパートアノテートセットと120万以上のイメージのより大きなコーパス — から得られた6つのプロンプト可能な視覚言語モデルと15のゼロショットCLIPライクなモデルをベンチマークします。ベンチマークされたモデルの中で、Gemini-3.1-flash-liteはすべてのスーパーカテゴリと両方のデータセットで他よりも優れており、一方、中程度のオープンウェイトモデルとのギャップは比較的小さいままである。 VLM予測は、画像毎の精度が適度である場合でも、人口レベルの傾向を確実に回復することができ、大規模な談話分析の出発点となる。チェーン・オブ・ソート推論は性能を改善するよりも低下し、アノテーションの次元に特有なプロンプト設計により性能が向上することがわかった。私たちはTweet IDとラベルをhttps://github.com/KathPra/Codebooks2VLMs.gitで公開しています。

論文の概要: From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change on Social Media

関連論文リスト