Related papers: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

URL: http://arxiv.org/abs/2003.12039v3
Date: Tue, 25 Aug 2020 15:49:48 GMT
Title: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
Authors: Zachary Teed and Jia Deng
Abstract summary: We introduce Recurrent All-Pairs Field Transforms (RAFT), a new deep network architecture for optical flow. RAFT extracts per-pixel features, builds multi-scale 4D correlation volumes for all pairs of pixels, and iteratively updates a flow field. RAFT achieves state-of-the-art performance.
Score: 78.92562539905951
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce Recurrent All-Pairs Field Transforms (RAFT), a new deep network architecture for optical flow. RAFT extracts per-pixel features, builds multi-scale 4D correlation volumes for all pairs of pixels, and iteratively updates a flow field through a recurrent unit that performs lookups on the correlation volumes. RAFT achieves state-of-the-art performance. On KITTI, RAFT achieves an F1-all error of 5.10%, a 16% error reduction from the best published result (6.10%). On Sintel (final pass), RAFT obtains an end-point-error of 2.855 pixels, a 30% error reduction from the best published result (4.098 pixels). In addition, RAFT has strong cross-dataset generalization as well as high efficiency in inference time, training speed, and parameter count. Code is available at https://github.com/princeton-vl/RAFT.

Related papers

LoFT: LoRA-fused Training Dataset Generation with Few-shot Guidance [96.6544564242316]
We introduce a novel dataset generation framework named LoFT, LoRA-Fused Training-data Generation with Few-shot Guidance.<n>Our method fine-tunes LoRA weights on individual real images and fuses them at inference time, producing synthetic images that combine the features of real images for improved diversity and fidelity of generated data.<n>Our experiments show that training on LoFT-generated data consistently outperforms other synthetic dataset methods, significantly increasing accuracy as the dataset size increases.
arXiv Detail & Related papers (2025-05-16T21:17:55Z)
Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers [55.87192133758051]
Diffusion Transformers (DiTs) have achieved state-of-the-art (SOTA) image generation quality but suffer from high latency and memory inefficiency. We propose DiffCR, a dynamic DiT inference framework with differentiable compression ratios.
arXiv Detail & Related papers (2024-12-22T02:04:17Z)
SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow [29.823972546363716]
We introduce SEA-RAFT, a more simple, efficient, and accurate RAFT for optical flow. SEA-RAFT achieves state-of-the-art accuracy on the Spring benchmark with a 3.69 endpoint-error (EPE) and a 0.36 1-pixel outlier rate (1px) With its high efficiency, SEA-RAFT operates at least 2.3x faster than existing methods while maintaining competitive performance.
arXiv Detail & Related papers (2024-05-23T17:04:04Z)
Rethinking RAFT for Efficient Optical Flow [9.115508086522887]
This paper proposes a novel approach based on the RAFT framework. It incorporates the attention mechanism to handle global feature extraction and address repetitive patterns. The proposed method, Efficient RAFT (Ef-RAFT),achieves significant improvements of 10% on the Sintel dataset and 5% on the KITTI dataset over RAFT.
arXiv Detail & Related papers (2024-01-01T18:23:39Z)
Neural Fields with Thermal Activations for Arbitrary-Scale Super-Resolution [56.089473862929886]
We present a novel way to design neural fields such that points can be queried with an adaptive Gaussian PSF. With its theoretically guaranteed anti-aliasing, our method sets a new state of the art for arbitrary-scale single image super-resolution.
arXiv Detail & Related papers (2023-11-29T14:01:28Z)
CLIP-FLow: Contrastive Learning by {s}emi-supervised Iterative Pseudo {l}abeling for Optical Flow Estimation [31.773232370688657]
We propose a semi-supervised iterative pseudo-labeling framework to transfer the pretraining knowledge to the target real domain. We leverage large-scale, unlabeled real data to facilitate transfer learning with the supervision of iteratively updated pseudo-ground truth labels. Our framework can also be extended to other models, e.g. CRAFT, reducing the F1-all error from 4.79% to 4.66% on KITTI 2015 benchmark.
arXiv Detail & Related papers (2022-10-25T23:22:25Z)
Differentiable Architecture Search with Random Features [80.31916993541513]
Differentiable architecture search (DARTS) has significantly promoted the development of NAS techniques because of its high search efficiency and effectiveness but suffers from performance collapse. In this paper, we make efforts to alleviate the performance collapse problem for DARTS with only training BatchNorm.
arXiv Detail & Related papers (2022-08-18T13:55:27Z)
DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow [7.73554718719193]
We propose a novel Patchmatch-based framework to work on high-resolution optical flow estimation. It can get high-precision results with lower memory benefiting from propagation and local search of Patchmatch. Our method ranks first on all the metrics on the popular KITTI2015 benchmark, and ranks second on EPE on the Sintel clean benchmark among published optical flow methods.
arXiv Detail & Related papers (2022-04-01T10:13:59Z)
Deep Residual Fourier Transformation for Single Image Deblurring [12.674752421170547]
Reconstructing a sharp image from its blurry counterpart requires changes regarding both low- and high-frequency information. We present a Residual Fast Fourier Transform with Convolution Block (Res FFT-Conv Block) capable of capturing both long-term and short-term interactions.
arXiv Detail & Related papers (2021-11-23T09:40:40Z)
RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching [60.44903340167672]
We introduce RAFT-Stereo, a new deep architecture for rectified stereo based on the optical flow network RAFT. We introduce multi-level convolutional GRUs, which more efficiently propagate information across the image. A modified version of RAFT-Stereo can perform accurate real-time inference.
arXiv Detail & Related papers (2021-09-15T19:27:31Z)
Fourier Space Losses for Efficient Perceptual Image Super-Resolution [131.50099891772598]
We show that it is possible to improve the performance of a recently introduced efficient generator architecture solely with the application of our proposed loss functions. We show that our losses' direct emphasis on the frequencies in Fourier-space significantly boosts the perceptual image quality. The trained generator achieves comparable results with and is 2.4x and 48x faster than state-of-the-art perceptual SR methods RankSRGAN and SRFlow respectively.
arXiv Detail & Related papers (2021-06-01T20:34:52Z)
Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization. Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.