Fugu-MT 論文翻訳(概要): Visual Retrieval-Augmented Generation for Silhouette-Guided Animal Art

論文の概要: Visual Retrieval-Augmented Generation for Silhouette-Guided Animal Art

arxiv url: http://arxiv.org/abs/2606.17431v1
Date: Tue, 16 Jun 2026 02:24:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-17 17:15:32.220162
Title: Visual Retrieval-Augmented Generation for Silhouette-Guided Animal Art
Title（参考訳）: シルエット誘導型動物芸術の視覚的検索型生成
Authors: Quoc-Duy Tran, Anh-Tuan Vo, Trung-Nghia Le,
Abstract要約: 本稿では,自然のシルエットから動物芸術を直接生成するVisual Retrieval-Augmented Generation (Visual-RAG)を紹介する。本手法は,28,586個の高品質シルエットの硬化体から構造的に類似した動物形状を回収する。結果は、Visual-RAGがもっともらしい解釈を提供する一方で、高い知覚的影響を達成する上での課題が残っていることを示している。
参考スコア（独自算出の注目度）: 4.154815727446656
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Generative AI has advanced the ability to render photorealistic or artistic images, yet it remains limited in a key aspect of human creativity: interpreting ambiguous shapes. This phenomenon, rooted in pareidolia, allows humans to perceive meaningful forms in random patterns such as clouds, stones, or leaves. To computationally replicate this imaginative process, we introduce Visual Retrieval-Augmented Generation (Visual-RAG), a framework that generates animal art directly from natural silhouettes. Our method retrieves structurally similar animal shapes from a curated corpus of 28,586 high-quality silhouettes and uses them as reference exemplars to guide diffusion-based generation with ControlNet and IP-Adapter. Ablation studies confirm that shape Context with RANSAC provides the most accurate alignment, while removing shape standardization reduces the inlier ratio to just 13.4\%, underscoring the importance of structural fidelity in Visual-RAG. A user study with 12 participants evaluated the outputs in terms of aesthetics, silhouette fidelity, and overall impression. Results reveal that while Visual-RAG provides plausible interpretations, challenges remain in achieving high perceptual impact. This work lays the foundation for computational pareidolia, showing how machines can contribute to the early stages of imaginative discovery.
Abstract（参考訳）: 生成AIは、フォトリアリスティックまたは芸術的な画像を描画する能力を進歩させたが、人間の創造性の重要な側面であるあいまいな形を解釈することはまだ限られている。この現象はパレドリアに根付いており、雲や石、葉といったランダムなパターンで意味のある形を知覚することができる。この想像的過程を計算的に再現するために,自然のシルエットから動物芸術を直接生成するVisual Retrieval-Augmented Generation (Visual-RAG)を導入する。提案手法は,28,586個の高品質シルエットの硬化体から構造的に類似した動物形状を検索し,参照例として,ControlNetとIP-Adapterを用いて拡散型世代を誘導する。アブレーション研究では、形状コンテキストとRANSACが最も正確なアライメントを提供するのに対して、形状標準化の除去は、不整合比をわずか13.4\%に減らし、ビジュアルRAGにおける構造的忠実性の重要性を強調している。被験者12名によるユーザスタディでは,美学,シルエットの忠実度,全体的な印象の点からアウトプットを評価した。結果は、Visual-RAGがもっともらしい解釈を提供する一方で、高い知覚的影響を達成する上での課題が残っていることを示している。この研究は、機械が想像上の発見の初期段階にどのように貢献できるかを示す計算パリドリアの基礎を築いた。

論文の概要: Visual Retrieval-Augmented Generation for Silhouette-Guided Animal Art

関連論文リスト