Fugu-MT 論文翻訳(概要): PANDORA: Pixel-wise Attention Dissolution and Latent Guidance for Zero-Shot Object Removal

論文の概要: PANDORA: Pixel-wise Attention Dissolution and Latent Guidance for Zero-Shot Object Removal

arxiv url: http://arxiv.org/abs/2603.27555v1
Date: Sun, 29 Mar 2026 07:34:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:45.023196
Title: PANDORA: Pixel-wise Attention Dissolution and Latent Guidance for Zero-Shot Object Removal
Title（参考訳）: PANDORA: ゼロショットオブジェクト除去のための画素単位の注意解離と潜時誘導
Authors: Dinh-Khoi Vo, Van-Loc Nguyen, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le,
Abstract要約: 本研究では,事前学習したテキスト・画像拡散モデル上で直接動作する新しいゼロショットオブジェクト除去フレームワークであるPANDORAを提案する。マスクされた画素に対して最も相関の深い注目キーを無効にすることで、オブジェクトを削除するために、Pixel-wise Attention Dissolutionを提案する。さらに, 対象物除去に好適な潜伏多様体に対して, ステアリングを行うための局所的注意散らし誘導について紹介する。
参考スコア（独自算出の注目度）: 18.565422674751215
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Removing objects from natural images is challenging due to difficulty of synthesizing semantically coherent content while preserving background integrity. Existing methods often rely on fine-tuning, prompt engineering, or inference-time optimization, yet still suffer from texture inconsistency, rigid artifacts, weak foreground-background disentanglement, and poor scalability for multi-object removal. We propose a novel zero-shot object removal framework, namely PANDORA, that operates directly on pre-trained text-to-image diffusion models, requiring no fine-tuning, prompts, or optimization. We propose Pixel-wise Attention Dissolution to remove object by nullifying the most correlated attention keys for masked pixels, effectively eliminating the object from self-attention flow and allowing background context to dominate reconstruction. We further introduce Localized Attentional Disentanglement Guidance to steer denoising toward latent manifolds favorable to clean object removal. Together, these components enable precise, non-rigid, prompt-free, and scalable multi-object erasure in a single pass. Experiments demonstrate superior visual fidelity and semantic plausibility compared to state-of-the-art methods. The project page is available at https://vdkhoi20.github.io/PANDORA.
Abstract（参考訳）: 背景の完全性を維持しながら意味的コヒーレントなコンテンツを合成することの難しさから、自然画像からオブジェクトを除去することは困難である。既存の手法は、微調整、迅速なエンジニアリング、あるいは推論時の最適化に頼っていることが多いが、それでもテクスチャの不整合、固いアーチファクト、前景と後方のゆがみの弱い、マルチオブジェクト除去のためのスケーラビリティの低下に悩まされている。本研究では,未学習のテキスト・画像拡散モデルを直接操作し,微調整やプロンプト,最適化を必要とせず,新たなゼロショットオブジェクト除去フレームワークであるPANDORAを提案する。マスクされた画素に対して最も関連性の高い注目キーを無効にし、自己注意の流れからオブジェクトを効果的に排除し、背景コンテキストが再構成を支配することによってオブジェクトを除去する。さらに, 物体除去に好適な潜伏多様体に対して, ステアリングを行うために, 局所的注意分散誘導を導入する。これらのコンポーネントは、正確で、厳密で、プロンプトフリーで、スケーラブルなマルチオブジェクト消去を可能にする。実験では、最先端の手法と比較して、視覚的忠実度と意味的妥当性が優れていることを示した。プロジェクトページはhttps://vdkhoi20.github.io/PANDORAで公開されている。

論文の概要: PANDORA: Pixel-wise Attention Dissolution and Latent Guidance for Zero-Shot Object Removal

関連論文リスト