Related papers: FlowFormer: A Transformer Architecture for Optical Flow

FlowFormer: A Transformer Architecture for Optical Flow

URL: http://arxiv.org/abs/2203.16194v1
Date: Wed, 30 Mar 2022 10:33:09 GMT
Title: FlowFormer: A Transformer Architecture for Optical Flow
Authors: Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Ka Chun Cheung, Hongwei Qin, Jifeng Dai, and Hongsheng Li
Abstract summary: Optical Flow TransFormer (FlowFormer) is a transformer-based neural network architecture for learning optical flow. FlowFormer tokenizes the 4D cost volume built from an image pair, encodes the cost tokens into a cost memory with alternate-group transformer layers. On the Sintel benchmark clean pass, FlowFormer achieves 1.178 average end-ponit-error (AEPE), a 15.1% error reduction from the best published result (1.388)
Score: 40.6027845855481
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Optical Flow TransFormer (FlowFormer), a transformer-based neural network architecture for learning optical flow. FlowFormer tokenizes the 4D cost volume built from an image pair, encodes the cost tokens into a cost memory with alternate-group transformer (AGT) layers in a novel latent space, and decodes the cost memory via a recurrent transformer decoder with dynamic positional cost queries. On the Sintel benchmark clean pass, FlowFormer achieves 1.178 average end-ponit-error (AEPE), a 15.1% error reduction from the best published result (1.388). Besides, FlowFormer also achieves strong generalization performance. Without being trained on Sintel, FlowFormer achieves 1.00 AEPE on the Sintel training set clean pass, outperforming the best published result (1.29) by 22.4%.

Related papers

CORDIC Is All You Need [0.18184027690235535]
We present pipelined architecture with CORDIC block for linear MAC computations and nonlinear iterative Activation Functions. This approach focuses on a Reconfigurable Processing Engine (RPE) based systolic array. FPGA implementation achieves a reduction of up to 2.5 $times$ resource savings and 3 $times$ power compared to prior works.
arXiv Detail & Related papers (2025-03-04T12:23:27Z)
Simple ReFlow: Improved Techniques for Fast Flow Models [68.32300636049008]
Diffusion and flow-matching models achieve remarkable generative performance but at the cost of many sampling steps. We propose seven improvements for training dynamics, learning and inference. We achieve state-of-the-art FID scores (without / with guidance, resp.) for fast generation via neural ODEs.
arXiv Detail & Related papers (2024-10-10T11:00:55Z)
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget [53.311109531586844]
We demonstrate very low-cost training of large-scale T2I diffusion transformer models. We train a 1.16 billion parameter sparse transformer with only $1,890 economical cost and achieve a 12.7 FID in zero-shot generation. We aim to release our end-to-end training pipeline to further democratize the training of large-scale diffusion models on micro-budgets.
arXiv Detail & Related papers (2024-07-22T17:23:28Z)
DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical Flow [44.57023882737517]
We introduce a lightweight low-latency and memory-efficient model for optical flow estimation. DIFT is feasible for edge applications such as mobile, XR, micro UAVs, robotics and cameras. We demonstrate first real-time cost-volume-based optical flow DL architecture on Snapdragon 8 Gen 1 HTP efficient mobile AI accelerator.
arXiv Detail & Related papers (2023-06-09T06:10:59Z)
FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow [49.40637769535569]
This paper introduces a novel transformer-based network architecture, FlowFormer, along with the Masked Cost Volume AutoVA (MCVA) for pretraining it to tackle the problem of optical flow estimation. FlowFormer tokenizes the 4D cost-volume built from the source-target image pair and iteratively refines flow estimation with a cost-volume encoder-decoder architecture. On the Sintel benchmark, FlowFormer architecture achieves 1.16 and 2.09 average end-point-error(AEPE) on the clean and final pass, a 16.5% and 15.5% error reduction from the
arXiv Detail & Related papers (2023-06-08T12:24:04Z)
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation [35.0926239683689]
FlowFormer introduces a transformer architecture into optical flow estimation and achieves state-of-the-art performance. We propose Masked Cost Volume Autoencoding (MCVA) to enhance FlowFormer by pretraining the cost-volume encoder with a novel MAE scheme. FlowFormer++ ranks 1st among published methods on both Sintel and KITTI-2015 benchmarks.
arXiv Detail & Related papers (2023-03-02T13:28:07Z)
FQ-ViT: Fully Quantized Vision Transformer without Retraining [13.82845665713633]
We present a systematic method to reduce the performance degradation and inference complexity of Quantized Transformers. We are the first to achieve comparable accuracy degradation (1%) on fully quantized Vision Transformers.
arXiv Detail & Related papers (2021-11-27T06:20:53Z)
End-to-End Multi-speaker Speech Recognition with Transformer [88.22355110349933]
We replace the RNN-based encoder-decoder in the speech recognition model with a Transformer architecture. We also modify the self-attention component to be restricted to a segment rather than the whole sequence in order to reduce computation.
arXiv Detail & Related papers (2020-02-10T16:29:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.