Fugu-MT 論文翻訳(概要): Generating Humanless Environment Walkthroughs from Egocentric Walking Tour Videos

論文の概要: Generating Humanless Environment Walkthroughs from Egocentric Walking Tour Videos

arxiv url: http://arxiv.org/abs/2603.29036v1
Date: Mon, 30 Mar 2026 22:08:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-01 15:25:02.882155
Title: Generating Humanless Environment Walkthroughs from Egocentric Walking Tour Videos
Title（参考訳）: エゴセントリックウォーキングツアービデオから人間中心の環境ウォークスルーを生成する
Authors: Yujin Ham, Junho Kim, Vivek Boominathan, Guha Balakrishnan,
Abstract要約: エゴセントリックな「ウォーキングツアー」ビデオは、世界中の環境のリッチで多様な視覚モデルを開発するために、画像データの豊富なソースを提供する。我々は、人間とその関連する影効果をウォーキングツアービデオからリアルに除去できる生成アルゴリズムを開発することで、この問題に対処することに集中する。その結果,都市部の3D/4Dモデルの構築に成功した。
参考スコア（独自算出の注目度）: 24.467955018300895
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Egocentric "walking tour" videos provide a rich source of image data to develop rich and diverse visual models of environments around the world. However, the significant presence of humans in frames of these videos due to crowds and eye-level camera perspectives mitigates their usefulness in environment modeling applications. We focus on addressing this challenge by developing a generative algorithm that can realistically remove (i.e., inpaint) humans and their associated shadow effects from walking tour videos. Key to our approach is the construction of a rich semi-synthetic dataset of video clip pairs to train this generative model. Each pair in the dataset consists of an environment-only background clip, and a composite clip of walking humans with simulated shadows overlaid on the background. We randomly sourced both foreground and background components from real egocentric walking tour videos around the world to maintain visual diversity. We then used this dataset to fine-tune the state-of-the-art Casper video diffusion model for object and effects inpainting, and demonstrate that the resulting model performs far better than Casper both qualitatively and quantitatively at removing humans from walking tour clips with significant human presence and complex backgrounds. Finally, we show that the resulting generated clips can be used to build successful 3D/4D models of urban locations.
Abstract（参考訳）: エゴセントリックな「ウォーキングツアー」ビデオは、世界中の環境のリッチで多様な視覚モデルを開発するために、画像データの豊富なソースを提供する。しかし、群衆や視線レベルのカメラの視点による映像のフレームに人間の重要な存在は、環境モデリングへの応用においてその有用性を軽減している。本研究では,人間をリアルに除去する生成アルゴリズムを開発し,それに関連する影効果をウォーキングツアービデオから除去することに焦点を当てる。我々のアプローチの鍵は、この生成モデルをトレーニングするために、ビデオクリップペアのリッチな半合成データセットを構築することである。データセットの各ペアは、環境のみの背景クリップと、シミュレーションされた影を背景に重ねて歩く人間の複合クリップで構成されている。我々は、視覚的多様性を維持するために、世界中の本物のエゴ中心のウォーキングツアービデオから、前景と背景の両方をランダムにソースしました。そして、このデータセットを用いて、オブジェクトとエフェクトのペイントのための最先端のCasperビデオ拡散モデルを微調整し、結果のモデルがCasperよりも質的にも定量的にも、人間の存在と複雑な背景を持つウォーキングツアークリップから人間を取り除くのに優れていることを実証した。最後に, 都市部の3D/4Dモデルの構築に成功した場合, 生成したクリップを用いて, 都市部の3D/4Dモデルを構築することができることを示す。

論文の概要: Generating Humanless Environment Walkthroughs from Egocentric Walking Tour Videos

関連論文リスト