Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images
- URL: http://arxiv.org/abs/2506.07740v1
- Date: Mon, 09 Jun 2025 13:23:44 GMT
- Title: Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images
- Authors: Yingping Liang, Ying Fu, Yutao Hu, Wenqi Shao, Jiaming Liu, Debing Zhang,
- Abstract summary: We develop a large-scale data generation framework designed to learn optical flow estimation from any single-view images in the real world.<n>For the first time, we demonstrate the benefits of generating optical flow training data from large-scale real-world images.<n>Our models serve as a foundation model and enhance the performance of various downstream video tasks.
- Score: 23.731451842621933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Optical flow estimation is a crucial subfield of computer vision, serving as a foundation for video tasks. However, the real-world robustness is limited by animated synthetic datasets for training. This introduces domain gaps when applied to real-world applications and limits the benefits of scaling up datasets. To address these challenges, we propose \textbf{Flow-Anything}, a large-scale data generation framework designed to learn optical flow estimation from any single-view images in the real world. We employ two effective steps to make data scaling-up promising. First, we convert a single-view image into a 3D representation using advanced monocular depth estimation networks. This allows us to render optical flow and novel view images under a virtual camera. Second, we develop an Object-Independent Volume Rendering module and a Depth-Aware Inpainting module to model the dynamic objects in the 3D representation. These two steps allow us to generate realistic datasets for training from large-scale single-view images, namely \textbf{FA-Flow Dataset}. For the first time, we demonstrate the benefits of generating optical flow training data from large-scale real-world images, outperforming the most advanced unsupervised methods and supervised methods on synthetic datasets. Moreover, our models serve as a foundation model and enhance the performance of various downstream video tasks.
Related papers
- IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations [64.07859467542664]
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics.<n>Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs.<n>We introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations.
arXiv Detail & Related papers (2024-12-16T18:52:56Z) - Improving Unsupervised Video Object Segmentation via Fake Flow Generation [20.89278343723177]
We propose a novel data generation method that simulates fake optical flows from single images.
Inspired by the observation that optical flow maps are highly dependent on depth maps, we generate fake optical flows by refining and augmenting the estimated depth maps of each image.
arXiv Detail & Related papers (2024-07-16T13:32:50Z) - Real3D: Scaling Up Large Reconstruction Models with Real-World Images [34.735198125706326]
Real3D is the first LRM system that can be trained using single-view real-world images.
We propose two unsupervised losses that allow us to supervise LRMs at the pixel- and semantic-level.
We develop an automatic data curation approach to collect high-quality examples from in-the-wild images.
arXiv Detail & Related papers (2024-06-12T17:59:08Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm [111.16358607889609]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.<n>For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - MPI-Flow: Learning Realistic Optical Flow with Multiplane Images [18.310665144874775]
We investigate generating realistic optical flow datasets from real-world images.
To generate highly realistic new images, we construct a layered depth representation, known as multiplane images (MPI), from single-view images.
To ensure the realism of motion, we present an independent object motion module that can separate the camera and dynamic object motion in MPI.
arXiv Detail & Related papers (2023-09-13T04:31:00Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos [28.995525297929348]
RealFlow is a framework that can create large-scale optical flow datasets directly from unlabeled realistic videos.
We first estimate optical flow between a pair of video frames, and then synthesize a new image from this pair based on the predicted flow.
Our approach achieves state-of-the-art performance on two standard benchmarks compared with both supervised and unsupervised optical flow methods.
arXiv Detail & Related papers (2022-07-22T13:33:03Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Optical Flow Estimation from a Single Motion-blurred Image [66.2061278123057]
Motion blur in an image may have practical interests in fundamental computer vision problems.
We propose a novel framework to estimate optical flow from a single motion-blurred image in an end-to-end manner.
arXiv Detail & Related papers (2021-03-04T12:45:18Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.