Fugu-MT 論文翻訳(概要): FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow

論文の概要: FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow

arxiv url: http://arxiv.org/abs/2306.05442v1
Date: Thu, 8 Jun 2023 12:24:04 GMT
ステータス: 翻訳完了
システム内更新日: 2023-06-12 16:04:44.695847
Title: FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow
Title（参考訳）: flowformer:光フローのためのトランスアーキテクチャとそのマスク付きコストボリュームオートエンコーディング
Authors: Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Yijin Li, Hongwei Qin, Jifeng Dai, Xiaogang Wang, and Hongsheng Li
Abstract要約: 本稿では,新しいトランスフォーマーベースのネットワークアーキテクチャであるFlowFormerとMasked Cost Volume AutoVA(MCVA)を導入し,光フロー推定の問題に取り組む。 FlowFormerは、ソースターゲットイメージペアから構築された4Dコストボリュームをトークン化し、コストボリュームエンコーダデコーダアーキテクチャでフロー推定を反復的に洗練する。 Sintelのベンチマークでは、FlowFormerアーキテクチャは、クリーンパスとファイナルパスの平均エンドポイントエラー(AEPE)を1.16と2.09で達成し、エラーを16.5%、エラーを15.5%削減した。
参考スコア（独自算出の注目度）: 49.40637769535569
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper introduces a novel transformer-based network architecture, FlowFormer, along with the Masked Cost Volume AutoEncoding (MCVA) for pretraining it to tackle the problem of optical flow estimation. FlowFormer tokenizes the 4D cost-volume built from the source-target image pair and iteratively refines flow estimation with a cost-volume encoder-decoder architecture. The cost-volume encoder derives a cost memory with alternate-group transformer~(AGT) layers in a latent space and the decoder recurrently decodes flow from the cost memory with dynamic positional cost queries. On the Sintel benchmark, FlowFormer architecture achieves 1.16 and 2.09 average end-point-error~(AEPE) on the clean and final pass, a 16.5\% and 15.5\% error reduction from the GMA~(1.388 and 2.47). MCVA enhances FlowFormer by pretraining the cost-volume encoder with a masked autoencoding scheme, which further unleashes the capability of FlowFormer with unlabeled data. This is especially critical in optical flow estimation because ground truth flows are more expensive to acquire than labels in other vision tasks. MCVA improves FlowFormer all-sided and FlowFormer+MCVA ranks 1st among all published methods on both Sintel and KITTI-2015 benchmarks and achieves the best generalization performance. Specifically, FlowFormer+MCVA achieves 1.07 and 1.94 AEPE on the Sintel benchmark, leading to 7.76\% and 7.18\% error reductions from FlowFormer.
Abstract（参考訳）: 本稿では,新しいトランスフォーマーベースのネットワークアーキテクチャであるFlowFormerとMasked Cost Volume AutoEncoding (MCVA)を導入し,光フロー推定の問題に取り組む。 FlowFormerは、ソースターゲットイメージペアから構築された4Dコストボリュームをトークン化し、コストボリュームエンコーダデコーダアーキテクチャでフロー推定を反復的に洗練する。コストボリュームエンコーダは、遅延空間内の代替グループ変換器〜(AGT)層でコストメモリを導出し、デコーダは動的位置コストクエリでコストメモリからのフローを繰り返し復号する。 sintelベンチマークでは、flowformerアーキテクチャは、クリーンで最終パスで1.16および2.09のエンドポイントエラー~(aepe)、gma~(1.388および2.47)から16.5\%と15.5\%のエラー低減を達成している。 MCVAは、コストボリュームエンコーダをマスク付きオートエンコードスキームで事前トレーニングすることでFlowFormerを強化し、ラベルのないデータでFlowFormerの能力をさらに解放する。これは光学的フロー推定において特に重要である。なぜなら、地上の真理流は他の視覚タスクのラベルよりも取得するコストが高いからである。 MCVAはFlowFormerを全面的に改善し、FlowFormer+MCVAはSintelとKITTI-2015ベンチマークで発表されたすべてのメソッドの中で第1位となり、最高の一般化性能を達成する。具体的には、FlowFormer+MCVAはSintelベンチマークで1.07と1.94のAEPEを達成する。

論文の概要: FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow

関連論文リスト