Fugu-MT 論文翻訳(概要): HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos

論文の概要: HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos

arxiv url: http://arxiv.org/abs/2605.17543v3
Date: Sat, 23 May 2026 05:05:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 16:32:37.644074
Title: HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos
Title（参考訳）: HL-OutPaint:高解像度ロングランジビデオの粗大な露光
Authors: Jeongeun Park, Janghyeok Han, Geonung Kim, Hyun-Seung Lee, Kyuha Choi, Youngseok Han, Sunghyun Cho,
Abstract要約: 映像の露光は、ビデオシーケンスの元々の空間範囲を超えて、可視な視覚コンテンツを生成する。 HL-OutPaintは長周期の高精細映像出力フレームワークである。我々のフレームワークは、空間展開と長いビデオシーケンスのための安定したコヒーレントな生成を実現する。
参考スコア（独自算出の注目度）: 14.04822758023478
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video outpainting generates plausible visual content beyond the original spatial extent of a video, playing a key role in adapting videos to diverse display formats. To support such use cases, it must enable large spatial extrapolation over long sequences. However, most existing methods address only one of these challenges or lack explicit mechanisms for ensuring global spatio-temporal consistency, leading to notable limitations. In this paper, we propose HL-OutPaint, a high-resolution video outpainting framework for long sequences. Our approach follows a coarse-to-fine strategy with a two-stage pipeline. We first construct Global Coarse Guidance (GCG), a low-resolution representation that captures global structure and dominant motion across the video. Unlike naive downsampling, GCG is built via a novel global-local frame swapping mechanism that couples sparse global keyframes with local temporal windows and exchanges information during sampling. This enables GCG to encode both long-term structural consistency and short-term temporal dynamics in a unified representation. Guided by this representation, HL-OutPaint then performs high-resolution outpainting to generate spatially detailed and temporally consistent content. By separating global structure modeling from fine-grained synthesis, our framework achieves stable, coherent generation for large spatial expansion and long video sequences. Extensive experiments show that HL-OutPaint outperforms existing methods in challenging scenarios involving wide spatial extrapolation and long video sequences.
Abstract（参考訳）: ビデオのアウトパインティングは、ビデオの本来の空間範囲を超えて、可視的な視覚コンテンツを生成し、様々な表示形式にビデオを適用する上で重要な役割を果たす。このようなユースケースをサポートするためには、長いシーケンスで大きな空間外挿を可能にする必要がある。しかしながら、既存のほとんどの手法はこれらの課題の1つにのみ対処し、グローバルな時空間一貫性を確保するための明確なメカニズムを欠いているため、顕著な制限が生じる。本稿では,長周期の高精細映像出力フレームワークHL-OutPaintを提案する。当社のアプローチは,2段階のパイプラインによる粗大な戦略に従っています。我々はまず,グローバルな構造と支配的な動きを捉えた低解像度な表現であるGCGを構築した。ナイーブなダウンサンプリングとは異なり、GCGは、疎グローバルなキーフレームをローカルの時間ウィンドウに結合し、サンプリング中に情報を交換する、新しいグローバルローカルフレームスワップ機構によって構築されている。これによりGCGは、長期的構造的一貫性と短期的時間的ダイナミクスの両方を統一表現でエンコードすることができる。この表現で導かれたHL-OutPaintは、空間的に詳細で時間的に一貫したコンテンツを生成するために高解像度のアウトペイントを行う。グローバルな構造モデリングを微細な合成から切り離すことにより,大規模な空間展開と長いビデオシーケンスのための安定なコヒーレントな生成を実現する。 HL-OutPaintは、広い空間外挿と長いビデオシーケンスを含む挑戦的なシナリオにおいて、既存の手法よりも優れていることを示す。

論文の概要: HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos

関連論文リスト