Fugu-MT 論文翻訳(概要): Vectorized Video Representation with Easy Editing via Hierarchical Spatio-Temporally Consistent Proxy Embedding

論文の概要: Vectorized Video Representation with Easy Editing via Hierarchical Spatio-Temporally Consistent Proxy Embedding

arxiv url: http://arxiv.org/abs/2510.12256v1
Date: Tue, 14 Oct 2025 08:05:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-15 19:02:32.238922
Title: Vectorized Video Representation with Easy Editing via Hierarchical Spatio-Temporally Consistent Proxy Embedding
Title（参考訳）: 階層的空間的一貫したプロキシ埋め込みによる簡易な編集によるベクトル化映像表現
Authors: Ye Chen, Liming Tan, Yupeng Zhu, Yuanbin Wang, Bingbing Ni,
Abstract要約: 提案した表現はより少ないパラメータで高い映像再構成精度を実現する。複雑なビデオ処理タスクをサポートし、ビデオのインペイントや時間的に一貫したビデオ編集を行う。
参考スコア（独自算出の注目度）: 45.593989778240655
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current video representations heavily rely on unstable and over-grained priors for motion and appearance modelling, \emph{i.e.}, pixel-level matching and tracking. A tracking error of just a few pixels would lead to the collapse of the visual object representation, not to mention occlusions and large motion frequently occurring in videos. To overcome the above mentioned vulnerability, this work proposes spatio-temporally consistent proxy nodes to represent dynamically changing objects/scenes in the video. On the one hand, the hierarchical proxy nodes have the ability to stably express the multi-scale structure of visual objects, so they are not affected by accumulated tracking error, long-term motion, occlusion, and viewpoint variation. On the other hand, the dynamic representation update mechanism of the proxy nodes adequately leverages spatio-temporal priors of the video to mitigate the impact of inaccurate trackers, thereby effectively handling drastic changes in scenes and objects. Additionally, the decoupled encoding manner of the shape and texture representations across different visual objects in the video facilitates controllable and fine-grained appearance editing capability. Extensive experiments demonstrate that the proposed representation achieves high video reconstruction accuracy with fewer parameters and supports complex video processing tasks, including video in-painting and keyframe-based temporally consistent video editing.
Abstract（参考訳）: 現在のビデオ表現は、動きや外観のモデリング、ピクセルレベルのマッチング、追跡など、不安定できめ細かな先行技術に大きく依存している。わずか数ピクセルのトラッキングエラーは、ビデオで頻繁に発生する閉塞や大きな動きだけでなく、視覚オブジェクト表現の崩壊につながる。上記の脆弱性を克服するため、ビデオ内の動的に変化するオブジェクト/シーンを表現するために、時空間的に一貫したプロキシノードを提案する。一方、階層的なプロキシノードは、視覚オブジェクトのマルチスケール構造を安定的に表現する能力を持つため、蓄積されたトラッキングエラー、長期動作、オクルージョン、視点変動の影響を受けない。一方、プロキシノードの動的表現更新機構は、ビデオの時空間的先行を適切に利用し、不正確なトラッカーの影響を緩和し、シーンやオブジェクトの劇的変化を効果的に処理する。さらに、ビデオ内の異なる視覚オブジェクト間の形状とテクスチャ表現の分離された符号化方式は、制御可能できめ細かい外観編集機能を促進する。広汎な実験により,提案した表現は少ないパラメータで高い映像再構成精度を実現し,複雑な映像処理タスクをサポートすることを示す。

論文の概要: Vectorized Video Representation with Easy Editing via Hierarchical Spatio-Temporally Consistent Proxy Embedding

関連論文リスト