FlowFormer: A Transformer Architecture for Optical Flow
- URL: http://arxiv.org/abs/2203.16194v1
- Date: Wed, 30 Mar 2022 10:33:09 GMT
- Title: FlowFormer: A Transformer Architecture for Optical Flow
- Authors: Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Ka Chun Cheung,
Hongwei Qin, Jifeng Dai, and Hongsheng Li
- Abstract summary: Optical Flow TransFormer (FlowFormer) is a transformer-based neural network architecture for learning optical flow.
FlowFormer tokenizes the 4D cost volume built from an image pair, encodes the cost tokens into a cost memory with alternate-group transformer layers.
On the Sintel benchmark clean pass, FlowFormer achieves 1.178 average end-ponit-error (AEPE), a 15.1% error reduction from the best published result (1.388)
- Score: 40.6027845855481
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Optical Flow TransFormer (FlowFormer), a transformer-based
neural network architecture for learning optical flow. FlowFormer tokenizes the
4D cost volume built from an image pair, encodes the cost tokens into a cost
memory with alternate-group transformer (AGT) layers in a novel latent space,
and decodes the cost memory via a recurrent transformer decoder with dynamic
positional cost queries. On the Sintel benchmark clean pass, FlowFormer
achieves 1.178 average end-ponit-error (AEPE), a 15.1% error reduction from the
best published result (1.388). Besides, FlowFormer also achieves strong
generalization performance. Without being trained on Sintel, FlowFormer
achieves 1.00 AEPE on the Sintel training set clean pass, outperforming the
best published result (1.29) by 22.4%.
Related papers
- Simple ReFlow: Improved Techniques for Fast Flow Models [68.32300636049008]
Diffusion and flow-matching models achieve remarkable generative performance but at the cost of many sampling steps.
We propose seven improvements for training dynamics, learning and inference.
We achieve state-of-the-art FID scores (without / with guidance, resp.) for fast generation via neural ODEs.
arXiv Detail & Related papers (2024-10-10T11:00:55Z) - Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget [53.311109531586844]
We demonstrate very low-cost training of large-scale T2I diffusion transformer models.
We train a 1.16 billion parameter sparse transformer with only $1,890 economical cost and achieve a 12.7 FID in zero-shot generation.
We aim to release our end-to-end training pipeline to further democratize the training of large-scale diffusion models on micro-budgets.
arXiv Detail & Related papers (2024-07-22T17:23:28Z) - DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical
Flow [44.57023882737517]
We introduce a lightweight low-latency and memory-efficient model for optical flow estimation.
DIFT is feasible for edge applications such as mobile, XR, micro UAVs, robotics and cameras.
We demonstrate first real-time cost-volume-based optical flow DL architecture on Snapdragon 8 Gen 1 HTP efficient mobile AI accelerator.
arXiv Detail & Related papers (2023-06-09T06:10:59Z) - FlowFormer: A Transformer Architecture and Its Masked Cost Volume
Autoencoding for Optical Flow [49.40637769535569]
This paper introduces a novel transformer-based network architecture, FlowFormer, along with the Masked Cost Volume AutoVA (MCVA) for pretraining it to tackle the problem of optical flow estimation.
FlowFormer tokenizes the 4D cost-volume built from the source-target image pair and iteratively refines flow estimation with a cost-volume encoder-decoder architecture.
On the Sintel benchmark, FlowFormer architecture achieves 1.16 and 2.09 average end-point-error(AEPE) on the clean and final pass, a 16.5% and 15.5% error reduction from the
arXiv Detail & Related papers (2023-06-08T12:24:04Z) - FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical
Flow Estimation [35.0926239683689]
FlowFormer introduces a transformer architecture into optical flow estimation and achieves state-of-the-art performance.
We propose Masked Cost Volume Autoencoding (MCVA) to enhance FlowFormer by pretraining the cost-volume encoder with a novel MAE scheme.
FlowFormer++ ranks 1st among published methods on both Sintel and KITTI-2015 benchmarks.
arXiv Detail & Related papers (2023-03-02T13:28:07Z) - FQ-ViT: Fully Quantized Vision Transformer without Retraining [13.82845665713633]
We present a systematic method to reduce the performance degradation and inference complexity of Quantized Transformers.
We are the first to achieve comparable accuracy degradation (1%) on fully quantized Vision Transformers.
arXiv Detail & Related papers (2021-11-27T06:20:53Z) - End-to-End Multi-speaker Speech Recognition with Transformer [88.22355110349933]
We replace the RNN-based encoder-decoder in the speech recognition model with a Transformer architecture.
We also modify the self-attention component to be restricted to a segment rather than the whole sequence in order to reduce computation.
arXiv Detail & Related papers (2020-02-10T16:29:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.