Related papers: CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices

CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices

URL: http://arxiv.org/abs/2412.13273v1
Date: Tue, 17 Dec 2024 19:06:12 GMT
Title: CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices
Authors: Andrei Znobishchev, Valerii Filev, Oleg Kudashev, Nikita Orlov, Humphrey Shi,
Abstract summary: We present CompactFlowNet, the first real-time mobile neural network for optical flow prediction.<n>Optical flow serves as a fundamental building block for various video-related tasks, such as video restoration, motion estimation, video stabilization, object tracking, action recognition, and video generation.
Score: 19.80162591240214
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present CompactFlowNet, the first real-time mobile neural network for optical flow prediction, which involves determining the displacement of each pixel in an initial frame relative to the corresponding pixel in a subsequent frame. Optical flow serves as a fundamental building block for various video-related tasks, such as video restoration, motion estimation, video stabilization, object tracking, action recognition, and video generation. While current state-of-the-art methods prioritize accuracy, they often overlook constraints regarding speed and memory usage. Existing light models typically focus on reducing size but still exhibit high latency, compromise significantly on quality, or are optimized for high-performance GPUs, resulting in sub-optimal performance on mobile devices. This study aims to develop a mobile-optimized optical flow model by proposing a novel mobile device-compatible architecture, as well as enhancements to the training pipeline, which optimize the model for reduced weight, low memory utilization, and increased speed while maintaining minimal error. Our approach demonstrates superior or comparable performance to the state-of-the-art lightweight models on the challenging KITTI and Sintel benchmarks. Furthermore, it attains a significantly accelerated inference speed, thereby yielding real-time operational efficiency on the iPhone 8, while surpassing real-time performance levels on more advanced mobile devices.

Related papers

Taming Diffusion Transformer for Real-Time Mobile Video Generation [72.20660234882594]
Diffusion Transformers (DiT) have shown strong performance in video generation tasks, but their high computational cost makes them impractical for resource-constrained devices like smartphones.<n>We propose a series of novel optimizations to significantly accelerate video generation and enable real-time performance on mobile platforms.
arXiv Detail & Related papers (2025-07-17T17:59:10Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Disentangled Motion Modeling for Video Frame Interpolation [40.83962594702387]
Video Frame Interpolation (VFI) aims to synthesize intermediate frames between existing frames to enhance visual smoothness and quality. We introduce disentangled Motion Modeling (MoMo), a diffusion-based approach for VFI that enhances visual quality by focusing on intermediate motion modeling.
arXiv Detail & Related papers (2024-06-25T03:50:20Z)
Track Everything Everywhere Fast and Robustly [46.362962852140015]
We propose a novel test-time optimization approach for efficiently tracking any pixel in a video. We introduce a novel invertible deformation network, CaDeX++, which factorizes the function representation into a local spatial-temporal feature grid. Our experiments demonstrate a substantial improvement in training speed (more than textbf10 times faster), robustness, and accuracy in tracking over the SoTA optimization-based method OmniMotion.
arXiv Detail & Related papers (2024-03-26T17:58:22Z)
Motion Flow Matching for Human Motion Synthesis and Editing [75.13665467944314]
We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications. Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
arXiv Detail & Related papers (2023-12-14T12:57:35Z)
Neuromorphic Optical Flow and Real-time Implementation with Event Cameras [47.11134388304464]
We build on the latest developments in event-based vision and spiking neural networks. We propose a new network architecture that improves the state-of-the-art self-supervised optical flow accuracy. We demonstrate high speed optical flow prediction with almost two orders of magnitude reduced complexity.
arXiv Detail & Related papers (2023-04-14T14:03:35Z)
Lightweight network towards real-time image denoising on mobile devices [26.130379174715742]
Deep convolutional neural networks have achieved great progress in image denoising tasks. Their complicated architectures and heavy computational cost hinder their deployments on mobile devices. We propose a mobile-friendly denoising network, namely MFDNet.
arXiv Detail & Related papers (2022-11-09T05:19:26Z)
StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception. We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy. Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z)
hARMS: A Hardware Acceleration Architecture for Real-Time Event-Based Optical Flow [0.0]
Event-based vision sensors produce asynchronous event streams with high temporal resolution based on changes in the visual scene. Existing solutions for calculating optical flow from event data fail to capture the true direction of motion due to the aperture problem. We present a hardware realization of the fARMS algorithm allowing for real-time computation of true flow on low-power, embedded platforms.
arXiv Detail & Related papers (2021-12-13T16:27:17Z)
FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks. Current networks often occupy large number of parameters and require heavy computation costs. Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)
Optical Flow Estimation from a Single Motion-blurred Image [66.2061278123057]
Motion blur in an image may have practical interests in fundamental computer vision problems. We propose a novel framework to estimate optical flow from a single motion-blurred image in an end-to-end manner.
arXiv Detail & Related papers (2021-03-04T12:45:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.