Fugu-MT 論文翻訳(概要): TempViz: On the Evaluation of Temporal Knowledge in Text-to-Image Models

論文の概要: TempViz: On the Evaluation of Temporal Knowledge in Text-to-Image Models

arxiv url: http://arxiv.org/abs/2601.14951v1
Date: Wed, 21 Jan 2026 12:52:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-22 21:27:50.36184
Title: TempViz: On the Evaluation of Temporal Knowledge in Text-to-Image Models
Title（参考訳）: TempViz:テキスト・画像モデルにおける時間的知識の評価について
Authors: Carolin Holtermann, Nina Krebs, Anne Lauscher,
Abstract要約: TempVizは、画像生成における時間的知識を階層的に評価する最初のデータセットである。 5つの時間的知識カテゴリにまたがる5つのT2Iモデルの能力について検討する。人間の評価では、時間的能力は一般的に弱く、カテゴリーごとの精度は75%を超えない。
参考スコア（独自算出の注目度）: 27.40006053562777
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Time alters the visual appearance of entities in our world, like objects, places, and animals. Thus, for accurately generating contextually-relevant images, knowledge and reasoning about time can be crucial (e.g., for generating a landscape in spring vs. in winter). Yet, although substantial work exists on understanding and improving temporal knowledge in natural language processing, research on how temporal phenomena appear and are handled in text-to-image (T2I) models remains scarce. We address this gap with TempViz, the first data set to holistically evaluate temporal knowledge in image generation, consisting of 7.9k prompts and more than 600 reference images. Using TempViz, we study the capabilities of five T2I models across five temporal knowledge categories. Human evaluation shows that temporal competence is generally weak, with no model exceeding 75% accuracy across categories. Towards larger-scale studies, we also examine automated evaluation methods, comparing several established approaches against human judgments. However, none of these approaches provides a reliable assessment of temporal cues - further indicating the pressing need for future research on temporal knowledge in T2I.
Abstract（参考訳）: 時間によって、物体、場所、動物といった世界における実体の視覚的外観が変化します。したがって、文脈に関連のある画像を正確に生成するためには、時間についての知識と推論が重要である(例えば、春と冬の風景を生成するために)。しかし、自然言語処理における時間的知識の理解と改善に関する重要な研究は存在するものの、時間的現象がどのように出現し、テキスト・ツー・イメージ(T2I)モデルで扱われるかの研究はほとんど残っていない。このギャップに対処するTempVizは、画像生成における時間的知識を総合的に評価する最初のデータセットであり、7.9kプロンプトと600以上の参照画像からなる。 TempVizを用いて,5つの時間的知識カテゴリにまたがる5つのT2Iモデルの能力について検討した。人間の評価では、時間的能力は一般的に弱く、カテゴリーごとの精度は75%を超えない。大規模研究に向けて,人間の判断に対するいくつかの確立されたアプローチを比較し,自動評価手法についても検討した。しかし、これらのアプローチはいずれも時間的手がかりの信頼性の高い評価を提供しておらず、さらにT2Iにおける時間的知識の今後の研究の必要性も示唆している。

論文の概要: TempViz: On the Evaluation of Temporal Knowledge in Text-to-Image Models

関連論文リスト