DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical
Flow
- URL: http://arxiv.org/abs/2306.05691v1
- Date: Fri, 9 Jun 2023 06:10:59 GMT
- Title: DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical
Flow
- Authors: Risheek Garrepalli, Jisoo Jeong, Rajeswaran C Ravindran, Jamie Menjay
Lin and Fatih Porikli
- Abstract summary: We introduce a lightweight low-latency and memory-efficient model for optical flow estimation.
DIFT is feasible for edge applications such as mobile, XR, micro UAVs, robotics and cameras.
We demonstrate first real-time cost-volume-based optical flow DL architecture on Snapdragon 8 Gen 1 HTP efficient mobile AI accelerator.
- Score: 44.57023882737517
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in neural network-based optical flow estimation often
come with prohibitively high computational and memory requirements, presenting
challenges in their model adaptation for mobile and low-power use cases. In
this paper, we introduce a lightweight low-latency and memory-efficient model,
Dynamic Iterative Field Transforms (DIFT), for optical flow estimation feasible
for edge applications such as mobile, XR, micro UAVs, robotics and cameras.
DIFT follows an iterative refinement framework leveraging variable resolution
of cost volumes for correspondence estimation. We propose a memory efficient
solution for cost volume processing to reduce peak memory. Also, we present a
novel dynamic coarse-to-fine cost volume processing during various stages of
refinement to avoid multiple levels of cost volumes. We demonstrate first
real-time cost-volume based optical flow DL architecture on Snapdragon 8 Gen 1
HTP efficient mobile AI accelerator with 32 inf/sec and 5.89 EPE (endpoint
error) on KITTI with manageable accuracy-performance tradeoffs.
Related papers
- Adaptive Layer Selection for Efficient Vision Transformer Fine-Tuning [18.776903525210933]
We introduce an efficient fine-tuning method for ViTs called $textbfALaST$ ($textitAdaptive Layer Selection Fine-Tuning for Vision Transformers$)
Our approach is based on the observation that not all layers are equally critical during fine-tuning, and their importance varies depending on the current mini-batch.
We show that this adaptive compute allocation enables a nearly-optimal schedule for distributing computational resources.
arXiv Detail & Related papers (2024-08-16T11:27:52Z) - Memory-Efficient Optical Flow via Radius-Distribution Orthogonal Cost
Volume [6.122542233250026]
We present MeFlow, a novel memory-efficient method for high-resolution optical flow estimation.
Our method achieves competitive performance on both Sintel and KITTI benchmarks, while maintaining the highest memory efficiency on high-resolution inputs.
arXiv Detail & Related papers (2023-12-06T12:43:11Z) - FlowFormer: A Transformer Architecture and Its Masked Cost Volume
Autoencoding for Optical Flow [49.40637769535569]
This paper introduces a novel transformer-based network architecture, FlowFormer, along with the Masked Cost Volume AutoVA (MCVA) for pretraining it to tackle the problem of optical flow estimation.
FlowFormer tokenizes the 4D cost-volume built from the source-target image pair and iteratively refines flow estimation with a cost-volume encoder-decoder architecture.
On the Sintel benchmark, FlowFormer architecture achieves 1.16 and 2.09 average end-point-error(AEPE) on the clean and final pass, a 16.5% and 15.5% error reduction from the
arXiv Detail & Related papers (2023-06-08T12:24:04Z) - READ: Recurrent Adaptation of Large Transformers [7.982905666062059]
Fine-tuning large-scale Transformers becomes impractical as the model size and number of tasks increase.
We introduce textbfREcurrent textbfADaption (READ) -- a lightweight and memory-efficient fine-tuning method.
arXiv Detail & Related papers (2023-05-24T16:59:41Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Lightweight Event-based Optical Flow Estimation via Iterative Deblurring [22.949700247611695]
We introduce IDNet, a lightweight yet high-performing event-based optical flow network directly estimating flow from event traces without using correlation volumes.
Our top-performing ID model sets a new state of art on DSEC benchmark.
Our base ID model is competitive with prior arts while using 80% fewer parameters, consuming 20x less memory footprint and running 40% faster on the NVidia Jetson Xavier NX.
arXiv Detail & Related papers (2022-11-24T17:26:27Z) - FlowFormer: A Transformer Architecture for Optical Flow [40.6027845855481]
Optical Flow TransFormer (FlowFormer) is a transformer-based neural network architecture for learning optical flow.
FlowFormer tokenizes the 4D cost volume built from an image pair, encodes the cost tokens into a cost memory with alternate-group transformer layers.
On the Sintel benchmark clean pass, FlowFormer achieves 1.178 average end-ponit-error (AEPE), a 15.1% error reduction from the best published result (1.388)
arXiv Detail & Related papers (2022-03-30T10:33:09Z) - HRFormer: High-Resolution Transformer for Dense Prediction [99.6060997466614]
We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks.
We take advantage of the multi-resolution parallel design introduced in high-resolution convolutional networks (HRNet)
We demonstrate the effectiveness of the High-Resolution Transformer on both human pose estimation and semantic segmentation tasks.
arXiv Detail & Related papers (2021-10-18T15:37:58Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.