Fugu-MT 論文翻訳(概要): MemCam: Memory-Augmented Camera Control for Consistent Video Generation

論文の概要: MemCam: Memory-Augmented Camera Control for Consistent Video Generation

arxiv url: http://arxiv.org/abs/2603.26193v1
Date: Fri, 27 Mar 2026 09:11:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-30 21:49:48.417496
Title: MemCam: Memory-Augmented Camera Control for Consistent Video Generation
Title（参考訳）: MemCam: Consistent Video Generationのためのメモリ拡張カメラ制御
Authors: Xinhang Gao, Junlin Guan, Shuhan Luo, Wenzhuo Li, Guanghuan Tan, Jiacheng Wang,
Abstract要約: 既存の手法は、ダイナミックカメラ制御下での長時間のビデオ生成において、シーンの一貫性を維持するのに苦労する。 MemCamは、以前生成されたフレームを外部メモリとして扱うメモリ拡張インタラクティブビデオ生成アプローチである。 MemCamは、シーンの一貫性という点で、オープンソースの最先端のアプローチを大きく上回っている。
参考スコア（独自算出の注目度）: 2.6353739437625348
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Interactive video generation has significant potential for scene simulation and video creation. However, existing methods often struggle with maintaining scene consistency during long video generation under dynamic camera control due to limited contextual information. To address this challenge, we propose MemCam, a memory-augmented interactive video generation approach that treats previously generated frames as external memory and leverages them as contextual conditioning to achieve controllable camera viewpoints with high scene consistency. To enable longer and more relevant context, we design a context compression module that encodes memory frames into compact representations and employs co-visibility-based selection to dynamically retrieve the most relevant historical frames, thereby reducing computational overhead while enriching contextual information. Experiments on interactive video generation tasks show that MemCam significantly outperforms existing baseline methods as well as open-source state-of-the-art approaches in terms of scene consistency, particularly in long video scenarios with large camera rotations.
Abstract（参考訳）: インタラクティブなビデオ生成は、シーンシミュレーションとビデオ生成において大きな可能性を秘めている。しかし、既存の手法では、コンテキスト情報に制限があるため、ダイナミックカメラ制御下での長時間のビデオ生成において、シーンの一貫性を維持するのに苦労することが多い。この課題に対処するため,メモリ拡張型インタラクティブビデオ生成手法であるMemCamを提案する。本研究では,メモリフレームをコンパクトな表現にエンコードするコンテキスト圧縮モジュールを設計し,コビジュアビリティに基づく選択により,最も関連性の高い履歴フレームを動的に検索し,文脈情報を充実させながら計算オーバーヘッドを低減する。インタラクティブなビデオ生成タスクの実験は、MemCamが既存のベースライン手法と、特に大きなカメラ回転を伴う長いビデオシナリオにおいて、シーンの一貫性の観点から、最先端のオープンソースアプローチを著しく上回っていることを示している。

論文の概要: MemCam: Memory-Augmented Camera Control for Consistent Video Generation

関連論文リスト