Related papers: FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow

FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow

URL: http://arxiv.org/abs/2306.05442v1
Date: Thu, 8 Jun 2023 12:24:04 GMT
Title: FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow
Authors: Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Yijin Li, Hongwei Qin, Jifeng Dai, Xiaogang Wang, and Hongsheng Li
Abstract summary: This paper introduces a novel transformer-based network architecture, FlowFormer, along with the Masked Cost Volume AutoVA (MCVA) for pretraining it to tackle the problem of optical flow estimation. FlowFormer tokenizes the 4D cost-volume built from the source-target image pair and iteratively refines flow estimation with a cost-volume encoder-decoder architecture. On the Sintel benchmark, FlowFormer architecture achieves 1.16 and 2.09 average end-point-error(AEPE) on the clean and final pass, a 16.5% and 15.5% error reduction from the
Score: 49.40637769535569
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper introduces a novel transformer-based network architecture, FlowFormer, along with the Masked Cost Volume AutoEncoding (MCVA) for pretraining it to tackle the problem of optical flow estimation. FlowFormer tokenizes the 4D cost-volume built from the source-target image pair and iteratively refines flow estimation with a cost-volume encoder-decoder architecture. The cost-volume encoder derives a cost memory with alternate-group transformer~(AGT) layers in a latent space and the decoder recurrently decodes flow from the cost memory with dynamic positional cost queries. On the Sintel benchmark, FlowFormer architecture achieves 1.16 and 2.09 average end-point-error~(AEPE) on the clean and final pass, a 16.5\% and 15.5\% error reduction from the GMA~(1.388 and 2.47). MCVA enhances FlowFormer by pretraining the cost-volume encoder with a masked autoencoding scheme, which further unleashes the capability of FlowFormer with unlabeled data. This is especially critical in optical flow estimation because ground truth flows are more expensive to acquire than labels in other vision tasks. MCVA improves FlowFormer all-sided and FlowFormer+MCVA ranks 1st among all published methods on both Sintel and KITTI-2015 benchmarks and achieves the best generalization performance. Specifically, FlowFormer+MCVA achieves 1.07 and 1.94 AEPE on the Sintel benchmark, leading to 7.76\% and 7.18\% error reductions from FlowFormer.

Related papers

Memory-Efficient Optical Flow via Radius-Distribution Orthogonal Cost Volume [6.122542233250026]
We present MeFlow, a novel memory-efficient method for high-resolution optical flow estimation. Our method achieves competitive performance on both Sintel and KITTI benchmarks, while maintaining the highest memory efficiency on high-resolution inputs.
arXiv Detail & Related papers (2023-12-06T12:43:11Z)
DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical Flow [44.57023882737517]
We introduce a lightweight low-latency and memory-efficient model for optical flow estimation. DIFT is feasible for edge applications such as mobile, XR, micro UAVs, robotics and cameras. We demonstrate first real-time cost-volume-based optical flow DL architecture on Snapdragon 8 Gen 1 HTP efficient mobile AI accelerator.
arXiv Detail & Related papers (2023-06-09T06:10:59Z)
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation [35.0926239683689]
FlowFormer introduces a transformer architecture into optical flow estimation and achieves state-of-the-art performance. We propose Masked Cost Volume Autoencoding (MCVA) to enhance FlowFormer by pretraining the cost-volume encoder with a novel MAE scheme. FlowFormer++ ranks 1st among published methods on both Sintel and KITTI-2015 benchmarks.
arXiv Detail & Related papers (2023-03-02T13:28:07Z)
FlowFormer: A Transformer Architecture for Optical Flow [40.6027845855481]
Optical Flow TransFormer (FlowFormer) is a transformer-based neural network architecture for learning optical flow. FlowFormer tokenizes the 4D cost volume built from an image pair, encodes the cost tokens into a cost memory with alternate-group transformer layers. On the Sintel benchmark clean pass, FlowFormer achieves 1.178 average end-ponit-error (AEPE), a 15.1% error reduction from the best published result (1.388)
arXiv Detail & Related papers (2022-03-30T10:33:09Z)
GMFlow: Learning Optical Flow via Global Matching [124.57850500778277]
We propose a GMFlow framework for learning optical flow estimation. It consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation. Our new framework outperforms 32-iteration RAFT's performance on the challenging Sintel benchmark.
arXiv Detail & Related papers (2021-11-26T18:59:56Z)
FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks. Current networks often occupy large number of parameters and require heavy computation costs. Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)
LiteFlowNet3: Resolving Correspondence Ambiguity for More Accurate Optical Flow Estimation [99.19322851246972]
We introduce LiteFlowNet3, a deep network consisting of two specialized modules to address the problem of optical flow estimation. LiteFlowNet3 not only achieves promising results on public benchmarks but also has a small model size and a fast runtime.
arXiv Detail & Related papers (2020-07-18T03:30:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.