Fugu-MT 論文翻訳(概要): HiddenObjects: Scalable Diffusion-Distilled Spatial Priors for Object Placement

論文の概要: HiddenObjects: Scalable Diffusion-Distilled Spatial Priors for Object Placement

arxiv url: http://arxiv.org/abs/2604.10675v1
Date: Sun, 12 Apr 2026 14:59:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:16.167479
Title: HiddenObjects: Scalable Diffusion-Distilled Spatial Priors for Object Placement
Title（参考訳）: HiddenObjects: オブジェクト配置のための拡張拡散拡張された空間的優先順位
Authors: Marco Schouten, Ioannis Siglidis, Serge Belongie, Dim P. Papadopoulos,
Abstract要約: 本研究では,自然の場面におけるオブジェクト配置の空間的事前条件を明示的に学習する手法を提案する。高品質な実環境における高密度オブジェクト配置を評価する,完全自動化およびスケーラブルなフレームワークを提案する。我々はこれらの先行データを高速な実用的な推論(23万倍高速)のための軽量なモデルに蒸留する。
参考スコア（独自算出の注目度）: 5.872282538713026
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a method to learn explicit, class-conditioned spatial priors for object placement in natural scenes by distilling the implicit placement knowledge encoded in text-conditioned diffusion models. Prior work relies either on manually annotated data, which is inherently limited in scale, or on inpainting-based object-removal pipelines, whose artifacts promote shortcut learning. To address these limitations, we introduce a fully automated and scalable framework that evaluates dense object placements on high-quality real backgrounds using a diffusion-based inpainting pipeline. With this pipeline, we construct HiddenObjects, a large-scale dataset comprising 27M placement annotations, evaluated across 27k distinct scenes, with ranked bounding box insertions for different images and object categories. Experimental results show that our spatial priors outperform sparse human annotations on a downstream image editing task (3.90 vs. 2.68 VLM-Judge), and significantly surpass existing placement baselines and zero-shot Vision-Language Models for object placement. Furthermore, we distill these priors into a lightweight model for fast practical inference (230,000x faster).
Abstract（参考訳）: 本研究では,テキスト条件拡散モデルに符号化された暗黙の配置知識を蒸留することにより,自然界におけるオブジェクト配置の明示的でクラス条件付き空間先行を学習する手法を提案する。従来の作業は手動でアノテートされたデータに依存しており、これは本質的に規模が限られている。これらの制約に対処するために,拡散型インパインティングパイプラインを用いて,高品質な実環境における高密度オブジェクト配置を評価する,完全に自動化されたスケーラブルなフレームワークを導入する。このパイプラインにより、27Mの配置アノテーションからなる大規模データセットであるHiddenObjectsを構築し、27kの異なるシーンで評価し、異なる画像やオブジェクトカテゴリに対してランク付けされたバウンディングボックス挿入を行う。実験の結果, 下流画像編集作業(3.90対2.68 VLM-Judge)では, 空間的先行が粗い人間のアノテーションよりも優れており, 既存の配置ベースラインやオブジェクト配置のためのゼロショット視覚言語モデルを大きく上回っていることがわかった。さらに,これらの前駆体を高速な実用的推論(23万倍高速)のための軽量なモデルに蒸留する。

論文の概要: HiddenObjects: Scalable Diffusion-Distilled Spatial Priors for Object Placement

関連論文リスト