Fugu-MT 論文翻訳(概要): SEA: Evaluating Sketch Abstraction Efficiency via Element-level Commonsense Visual Question Answering

論文の概要: SEA: Evaluating Sketch Abstraction Efficiency via Element-level Commonsense Visual Question Answering

arxiv url: http://arxiv.org/abs/2603.28363v1
Date: Mon, 30 Mar 2026 12:30:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:45.390537
Title: SEA: Evaluating Sketch Abstraction Efficiency via Element-level Commonsense Visual Question Answering
Title（参考訳）: SEA: 要素レベルのコモンセンスビジュアル質問応答によるスケッチ抽象化効率の評価
Authors: Jiho Park, Sieun Choi, Jaeyoon Seo, Minho Sohn, Yeana Kim, Jihie Kim,
Abstract要約: スケッチがクラス定義の視覚要素をどのように表現するかを経済的に評価する参照不要な指標であるSEA(Sketch Evaluation metric for Abstraction efficiency)を紹介する。コモンスケッチ(CommonSketch)は,300のクラスにまたがる23,100の人書きスケッチからなる,最初の意味的注釈付きスケッチデータセットである。
参考スコア（独自算出の注目度）: 4.267729144203295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A sketch is a distilled form of visual abstraction that conveys core concepts through simplified yet purposeful strokes while omitting extraneous detail. Despite its expressive power, quantifying the efficiency of semantic abstraction in sketches remains challenging. Existing evaluation methods that rely on reference images, low-level visual features, or recognition accuracy do not capture abstraction, the defining property of sketches. To address these limitations, we introduce SEA (Sketch Evaluation metric for Abstraction efficiency), a reference-free metric that assesses how economically a sketch represents class-defining visual elements while preserving semantic recognizability. These elements are derived per class from commonsense knowledge about features typically depicted in sketches. SEA leverages a visual question answering model to determine the presence of each element and returns a quantitative score that reflects semantic retention under visual economy. To support this metric, we present CommonSketch, the first semantically annotated sketch dataset, comprising 23,100 human-drawn sketches across 300 classes, each paired with a caption and element-level annotations. Experiments show that SEA aligns closely with human judgments and reliably discriminates levels of abstraction efficiency, while CommonSketch serves as a benchmark providing systematic evaluation of element-level sketch understanding across various vision-language models.
Abstract（参考訳）: スケッチ(英: sketch)は、単純だが目的のあるストロークを通してコア概念を伝達し、余分な詳細を省略する、蒸留された視覚抽象形式である。その表現力にもかかわらず、スケッチにおける意味的抽象の効率を定量化することは依然として困難である。参照画像、低レベル視覚特徴、認識精度に依存する既存の評価手法は、スケッチの定義特性である抽象化を捉えない。これらの制約に対処するため、セケッチ評価基準(Sketch Evaluation metric for Abstraction efficiency)を導入し、セマンティック認識性を保ちながら、スケッチがクラス定義の視覚要素をどのように経済的に表現するかを評価する。これらの要素は、スケッチで典型的に描かれる特徴に関する常識的な知識からクラスごとに派生している。 SEAは視覚的質問応答モデルを利用して各要素の存在を判断し、視覚経済下での意味的保持を反映した定量的スコアを返す。このメトリクスをサポートするために、CommonSketchは、300のクラスに23,100の人書きスケッチで構成され、それぞれにキャプションと要素レベルのアノテーションが組み合わされた最初の意味的注釈付きスケッチデータセットである。実験の結果、SEAは人間の判断と密接に一致し、抽象化効率のレベルを確実に識別する一方で、CommonSketchは様々な視覚言語モデルにおける要素レベルのスケッチ理解の体系的な評価を提供するベンチマークとして機能することがわかった。

論文の概要: SEA: Evaluating Sketch Abstraction Efficiency via Element-level Commonsense Visual Question Answering

関連論文リスト