Fugu-MT 論文翻訳(概要): CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

論文の概要: CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

arxiv url: http://arxiv.org/abs/2308.07926v1
Date: Tue, 15 Aug 2023 17:59:56 GMT
ステータス: 翻訳完了
システム内更新日: 2023-08-16 11:43:39.320165
Title: CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Title（参考訳）: CoDeF:一時連続ビデオ処理のためのコンテンツ変形場
Authors: Hao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen, Yujun Shen
Abstract要約: CoDeFは、標準コンテンツフィールドと時間変形フィールドからなる新しいタイプのビデオ表現である。実験により,CoDeFは,映像から映像への変換とキーポイント検出をキーポイントトラッキングに,トレーニングなしで持ち上げることができることを示した。
参考スコア（独自算出の注目度）: 89.49585127724941
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i.e., rendered from the canonical content field) to each individual frame along the time axis.Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline.We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e.g., the object shape) from the video.With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field.We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training.More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog.Project page can be found at https://qiuyu96.github.io/CoDeF/.
Abstract（参考訳）: We present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i.e., rendered from the canonical content field) to each individual frame along the time axis.Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline.We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e.g., the object shape) from the video.With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field.We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training.More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog.Project page can be found at https://qiuyu96.github.io/CoDeF/.

関連論文リスト

VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control [47.34885131252508]
ビデオのインペイントは、腐敗したビデオコンテンツを復元することを目的としている。マスク付きビデオを処理するための新しいデュアルストリームパラダイムVideoPainterを提案する。また,任意の長さの映像を描ける新しいターゲット領域ID再サンプリング手法も導入する。
論文参考訳（メタデータ） (2025-03-07T17:59:46Z)
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [52.63845811751936]
ダイナミックスビデオのモデリングのため、ビデオ事前トレーニングは難しい。本稿では,ビデオ事前学習におけるこのような制限を,効率的なビデオ分解によって解決する。筆者らのフレームワークは,13のマルチモーダルベンチマークにおいて,画像と映像のコンテントの理解と生成が可能であることを実証した。
論文参考訳（メタデータ） (2024-02-05T16:30:49Z)
GenDeF: Learning Generative Deformation Field for Video Generation [89.49567113452396]
我々は1つの静止画像を生成変形場(GenDeF)でワープすることで映像をレンダリングすることを提案する。このようなパイプラインには,魅力的なメリットが3つあります。
論文参考訳（メタデータ） (2023-12-07T18:59:41Z)
LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation [21.815083817914843]
我々はtextitLatentWarp というゼロショット動画翻訳フレームワークを提案する。我々のアプローチは単純で、クエリトークンの時間的一貫性を制約するために、潜伏した空間にワープ操作を組み込む。 textitLatentWarpの時間的コヒーレンスによるビデオ間翻訳における優位性を示す実験結果を得た。
論文参考訳（メタデータ） (2023-11-01T08:02:57Z)
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval [24.691270610091554]
本稿では,ビデオから意味的に強調された表現を純粋に学習し,ビデオ表現をオフラインで計算し,異なるテキストに対して再利用することを目的とする。 MSR-VTT, MSVD, LSMDCの3つのベンチマークデータセット上で, 最先端のパフォーマンスを得る。
論文参考訳（メタデータ） (2023-08-15T08:54:25Z)
RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GANのインバージョンは、実画像に強力な編集可能性を適用するのに不可欠である。既存のビデオフレームを個別に反転させる手法は、時間の経過とともに望ましくない一貫性のない結果をもたらすことが多い。我々は、textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID) という統合されたリカレントフレームワークを提案する。本フレームワークは,入力フレーム間の固有コヒーレンスをエンドツーエンドで学習する。
論文参考訳（メタデータ） (2023-08-11T12:17:24Z)
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation [93.18163456287164]
本稿では,動画に画像モデルを適用するための新しいテキスト誘導型動画翻訳フレームワークを提案する。我々のフレームワークは,グローバルなスタイルと局所的なテクスチャの時間的一貫性を低コストで実現している。
論文参考訳（メタデータ） (2023-06-13T17:52:23Z)
WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction [82.79642869586587]
WALDOは、過去のビデオフレームを予測するための新しいアプローチである。個々の画像は、オブジェクトマスクと小さなコントロールポイントのセットを組み合わせた複数の層に分解される。レイヤ構造は、各ビデオ内のすべてのフレーム間で共有され、フレーム間の密接な接続を構築する。
論文参考訳（メタデータ） (2022-11-25T18:59:46Z)
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens [93.98605636451806]
StructureViTは、トレーニング中にのみ利用可能な少数の画像の構造を利用することで、ビデオモデルを改善する方法を示している。 SViTでは、複数のビデオ理解タスクとデータセットのパフォーマンスが大幅に向上している。
論文参考訳（メタデータ） (2022-06-13T17:45:05Z)
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning [40.556222166309524]
ビデオキャプションのためのエンドツーエンドトランスフォーマーモデルであるSwinBERTを提案する。提案手法では,ビデオ入力の可変長に適応可能な空間時間表現を符号化するために,ビデオトランスフォーマを採用している。このモデルアーキテクチャに基づいて,より密集したビデオフレームの映像キャプションが有用であることを示す。
論文参考訳（メタデータ） (2021-11-25T18:02:12Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。