Fugu-MT 論文翻訳(概要): CRePE: Curved Ray Expectation Positional Encoding for Unified-Camera-Controlled Video Generation

論文の概要: CRePE: Curved Ray Expectation Positional Encoding for Unified-Camera-Controlled Video Generation

arxiv url: http://arxiv.org/abs/2605.12938v1
Date: Wed, 13 May 2026 03:18:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 23:30:27.781428
Title: CRePE: Curved Ray Expectation Positional Encoding for Unified-Camera-Controlled Video Generation
Title（参考訳）: CrePE: 統一カメラ制御ビデオ生成のための曲面線予測位置符号化
Authors: Seonghyun Jin, Youngmin Kim, Sunwoo Park, Jong Chul Ye,
Abstract要約: カメラコンディショニング映像生成のためのレイ期待位置制御(CrePE)を提案する。 CREPEは、各画像トークンをソース線に沿った奥行き認識位置分布として表現し、広角カメラや魚眼カメラによって誘導される投影された経路形状を捉えている。デザインは、より安定したカメラコントロールをもたらし、幾何認識と知覚品質のメトリクスを改善します。
参考スコア（独自算出の注目度）: 48.95338161181985
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Camera-conditioned video generation requires positional encoding that remains reliable under changes in camera motion, lens configuration, and scene structure. However, existing attention-level camera encodings either provide ray-only camera signals or rely on pinhole camera geometry, limiting their applicability to general camera control under the Unified Camera Model, including wide-angle and fisheye lenses. To address this limitation, we propose Curved Ray Expectation Positional Encoding (CRePE). CRePE represents each image token as a depth-aware positional distribution along its source ray, providing a Unified Camera Model-compatible positional encoding that captures the projected-path geometry induced by wide-angle and fisheye cameras. CRePE is implemented through a Geometric Attention Adapter added to frozen video DiTs, injecting token-wise scene-distance information into selected attention layers and stabilizing it with pseudo supervision from a monocular geometry foundation model. This design leads to more stable camera control and improves several geometry-aware and perceptual-quality metrics, while remaining competitive on video-quality metrics. Controlled positional-encoding ablations show a better overall average rank than a RayRoPE-style endpoint PE baseline, demonstrating the effectiveness of UCM-aware projected-path integration across diverse camera models. Furthermore, by extending the same positional-encoding pathway to external geometry control through Radial MixForcing, CRePE supports external radial-map control for scene-geometry-conditioned generation and source-video motion transfer beyond camera control.
Abstract（参考訳）: カメラコンディショニングビデオ生成には、カメラモーション、レンズ構成、シーン構造の変化の下で信頼性の高い位置符号化が必要である。しかし、既存の注目レベルのカメラエンコーディングは、光線のみのカメラ信号を提供するか、ピンホールカメラの幾何学に依存し、広角レンズや魚眼レンズを含む統一カメラモデルの下での一般的なカメラ制御に適用性を制限する。この制限に対処するため、CrePE(Curved Ray expectation Positional Encoding)を提案する。 CREPEは、それぞれの画像トークンを、そのソース線に沿った奥行き認識位置分布として表現し、広角カメラと魚眼カメラによって誘導される投影されたパス形状をキャプチャする統一カメラモデル互換の位置符号化を提供する。 CREPEは、凍結ビデオDiTに付加された幾何学的注意適応器によって実装され、選択された注目層にトークンワイドのシーン距離情報を注入し、モノクロ幾何学基礎モデルから擬似的な監督で安定化する。この設計は、より安定したカメラ制御をもたらし、ビデオ品質のメトリクスに競争力を維持しながら、幾つもの幾何学的認識と知覚的品質のメトリクスを改善する。制御された位置エンコードにより、RayRoPE方式のエンドポイントPEベースラインよりも全体的な平均ランクが向上し、多様なカメラモデルにまたがるUCM対応の投影パス統合の有効性が示された。さらに、同じ位置エンコーディング経路をRadial MixForcingを通じて外部形状制御に拡張することにより、CREPEは、シーンジオメトリ条件付き生成のための外部半径マップ制御と、カメラ制御を超えたソースビデオモーション転送をサポートする。

論文の概要: CRePE: Curved Ray Expectation Positional Encoding for Unified-Camera-Controlled Video Generation

関連論文リスト