FlowFormer: A Transformer Architecture and Its Masked Cost Volume
Autoencoding for Optical Flow
- URL: http://arxiv.org/abs/2306.05442v1
- Date: Thu, 8 Jun 2023 12:24:04 GMT
- Title: FlowFormer: A Transformer Architecture and Its Masked Cost Volume
Autoencoding for Optical Flow
- Authors: Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Yijin Li, Hongwei
Qin, Jifeng Dai, Xiaogang Wang, and Hongsheng Li
- Abstract summary: This paper introduces a novel transformer-based network architecture, FlowFormer, along with the Masked Cost Volume AutoVA (MCVA) for pretraining it to tackle the problem of optical flow estimation.
FlowFormer tokenizes the 4D cost-volume built from the source-target image pair and iteratively refines flow estimation with a cost-volume encoder-decoder architecture.
On the Sintel benchmark, FlowFormer architecture achieves 1.16 and 2.09 average end-point-error(AEPE) on the clean and final pass, a 16.5% and 15.5% error reduction from the
- Score: 49.40637769535569
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper introduces a novel transformer-based network architecture,
FlowFormer, along with the Masked Cost Volume AutoEncoding (MCVA) for
pretraining it to tackle the problem of optical flow estimation. FlowFormer
tokenizes the 4D cost-volume built from the source-target image pair and
iteratively refines flow estimation with a cost-volume encoder-decoder
architecture. The cost-volume encoder derives a cost memory with
alternate-group transformer~(AGT) layers in a latent space and the decoder
recurrently decodes flow from the cost memory with dynamic positional cost
queries. On the Sintel benchmark, FlowFormer architecture achieves 1.16 and
2.09 average end-point-error~(AEPE) on the clean and final pass, a 16.5\% and
15.5\% error reduction from the GMA~(1.388 and 2.47). MCVA enhances FlowFormer
by pretraining the cost-volume encoder with a masked autoencoding scheme, which
further unleashes the capability of FlowFormer with unlabeled data. This is
especially critical in optical flow estimation because ground truth flows are
more expensive to acquire than labels in other vision tasks. MCVA improves
FlowFormer all-sided and FlowFormer+MCVA ranks 1st among all published methods
on both Sintel and KITTI-2015 benchmarks and achieves the best generalization
performance. Specifically, FlowFormer+MCVA achieves 1.07 and 1.94 AEPE on the
Sintel benchmark, leading to 7.76\% and 7.18\% error reductions from
FlowFormer.
Related papers
- Memory-Efficient Optical Flow via Radius-Distribution Orthogonal Cost
Volume [6.122542233250026]
We present MeFlow, a novel memory-efficient method for high-resolution optical flow estimation.
Our method achieves competitive performance on both Sintel and KITTI benchmarks, while maintaining the highest memory efficiency on high-resolution inputs.
arXiv Detail & Related papers (2023-12-06T12:43:11Z) - DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical
Flow [44.57023882737517]
We introduce a lightweight low-latency and memory-efficient model for optical flow estimation.
DIFT is feasible for edge applications such as mobile, XR, micro UAVs, robotics and cameras.
We demonstrate first real-time cost-volume-based optical flow DL architecture on Snapdragon 8 Gen 1 HTP efficient mobile AI accelerator.
arXiv Detail & Related papers (2023-06-09T06:10:59Z) - FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical
Flow Estimation [35.0926239683689]
FlowFormer introduces a transformer architecture into optical flow estimation and achieves state-of-the-art performance.
We propose Masked Cost Volume Autoencoding (MCVA) to enhance FlowFormer by pretraining the cost-volume encoder with a novel MAE scheme.
FlowFormer++ ranks 1st among published methods on both Sintel and KITTI-2015 benchmarks.
arXiv Detail & Related papers (2023-03-02T13:28:07Z) - FlowFormer: A Transformer Architecture for Optical Flow [40.6027845855481]
Optical Flow TransFormer (FlowFormer) is a transformer-based neural network architecture for learning optical flow.
FlowFormer tokenizes the 4D cost volume built from an image pair, encodes the cost tokens into a cost memory with alternate-group transformer layers.
On the Sintel benchmark clean pass, FlowFormer achieves 1.178 average end-ponit-error (AEPE), a 15.1% error reduction from the best published result (1.388)
arXiv Detail & Related papers (2022-03-30T10:33:09Z) - GMFlow: Learning Optical Flow via Global Matching [124.57850500778277]
We propose a GMFlow framework for learning optical flow estimation.
It consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation.
Our new framework outperforms 32-iteration RAFT's performance on the challenging Sintel benchmark.
arXiv Detail & Related papers (2021-11-26T18:59:56Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - LiteFlowNet3: Resolving Correspondence Ambiguity for More Accurate
Optical Flow Estimation [99.19322851246972]
We introduce LiteFlowNet3, a deep network consisting of two specialized modules to address the problem of optical flow estimation.
LiteFlowNet3 not only achieves promising results on public benchmarks but also has a small model size and a fast runtime.
arXiv Detail & Related papers (2020-07-18T03:30:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.