WAFT: Warping-Alone Field Transforms for Optical Flow
- URL: http://arxiv.org/abs/2506.21526v1
- Date: Thu, 26 Jun 2025 17:47:59 GMT
- Title: WAFT: Warping-Alone Field Transforms for Optical Flow
- Authors: Yihan Wang, Jia Deng,
- Abstract summary: We introduce Warping-Alone Field Transforms (WAFT), a simple and effective method for optical flow.<n>WAFT is similar to RAFT but replaces cost volume with high-resolution warping, achieving better accuracy with lower memory cost.
- Score: 30.98695432299259
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Warping-Alone Field Transforms (WAFT), a simple and effective method for optical flow. WAFT is similar to RAFT but replaces cost volume with high-resolution warping, achieving better accuracy with lower memory cost. This design challenges the conventional wisdom that constructing cost volumes is necessary for strong performance. WAFT is a simple and flexible meta-architecture with minimal inductive biases and reliance on custom designs. Compared with existing methods, WAFT ranks 1st on Spring and KITTI benchmarks, achieves the best zero-shot generalization on KITTI, while being up to 4.1x faster than methods with similar performance. Code and model weights are available at https://github.com/princeton-vl/WAFT.
Related papers
- On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification [50.30835290642069]
We present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM)<n>We reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generalization capabilities of model.<n>We propose Dynamic Fine-Tuning (DFT), stabilizing gradient updates for each token by dynamically rescaling the objective function with the probability of this token.
arXiv Detail & Related papers (2025-08-07T17:59:04Z) - Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models [14.68920095399595]
sparsity-based PEFT (SPEFT) introduces trainable sparse adaptations to the weight matrices in the model.<n>We conduct the first systematic evaluation of salience metrics for SPEFT, inspired by zero-cost NAS proxies.<n>Our work challenges the notion that complexity is necessary for effective PEFT.
arXiv Detail & Related papers (2024-12-18T04:14:35Z) - Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models.
Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information.
Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z) - DEFT: Efficient Fine-Tuning of Diffusion Models by Learning the Generalised $h$-transform [44.29325094229024]
We propose DEFT (Doob's h-transform Efficient FineTuning), a new approach for conditional generation that simply fine-tunes a very small network to quickly learn the conditional $h$-transform.<n>On image reconstruction tasks, we achieve speedups of up to 1.6$times$, while having the best perceptual quality on natural images and reconstruction performance on medical images.
arXiv Detail & Related papers (2024-06-03T20:52:34Z) - SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow [29.823972546363716]
We introduce SEA-RAFT, a more simple, efficient, and accurate RAFT for optical flow.
SEA-RAFT achieves state-of-the-art accuracy on the Spring benchmark with a 3.69 endpoint-error (EPE) and a 0.36 1-pixel outlier rate (1px)
With its high efficiency, SEA-RAFT operates at least 2.3x faster than existing methods while maintaining competitive performance.
arXiv Detail & Related papers (2024-05-23T17:04:04Z) - PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation [61.57833648734164]
We propose a novel Parallel Yielding Re-Activation (PYRA) method for training-inference efficient task adaptation.
PYRA outperforms all competing methods under both low compression rate and high compression rate.
arXiv Detail & Related papers (2024-03-14T09:06:49Z) - From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers [52.199303258423306]
We propose a novel density loss that encourages higher activation sparsity in pre-trained models.
Our proposed method, textbfDEFT, can consistently reduce activation density by up to textbf44.94% on RoBERTa$_mathrmLarge$ and by textbf53.19% (encoder density) and textbf90.60% (decoder density) on Flan-T5$_mathrmXXL$.
arXiv Detail & Related papers (2024-02-02T21:25:46Z) - CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task
Adaptation [3.043665249713003]
Post-training compression techniques such as pruning and quantization can help lower deployment costs.
We propose CrAFT, a simple fine-tuning framework that enables effective post-training network compression.
The CrAFT approach adds negligible training overhead as fine-tuning is done in under a couple of minutes or hours with a single GPU.
arXiv Detail & Related papers (2023-05-08T07:51:40Z) - Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency [1.292809267782105]
Sparse Iso-FLOP Transformations (Sparse-IFT) uses sparsity to improve accuracy while maintaining dense model FLOPs.
Our study reveals a robust correlation among mask topology, weights, and final performance.
To the best of our knowledge, this is the first work to demonstrate the use of sparsity for improving the accuracy of dense models.
arXiv Detail & Related papers (2023-03-21T01:06:37Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.