DepthFlow: Exploiting Depth-Flow Structural Correlations for Unsupervised Video Object Segmentation
- URL: http://arxiv.org/abs/2507.19790v1
- Date: Sat, 26 Jul 2025 04:31:16 GMT
- Title: DepthFlow: Exploiting Depth-Flow Structural Correlations for Unsupervised Video Object Segmentation
- Authors: Suhwan Cho, Minhyeok Lee, Jungho Lee, Donghyeong Kim, Sangyoun Lee,
- Abstract summary: Video object segmentation (VOS) aims to detect the most prominent object in a video.<n>We propose DepthFlow, a novel data generation method that synthesizes optical flow from single images.<n>We achieve new state-of-the-art performance on all public VOS benchmarks, demonstrating a scalable and effective solution to the data scarcity problem.
- Score: 14.635179908525389
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised video object segmentation (VOS) aims to detect the most prominent object in a video. Recently, two-stream approaches that leverage both RGB images and optical flow have gained significant attention, but their performance is fundamentally constrained by the scarcity of training data. To address this, we propose DepthFlow, a novel data generation method that synthesizes optical flow from single images. Our approach is driven by the key insight that VOS models depend more on structural information embedded in flow maps than on their geometric accuracy, and that this structure is highly correlated with depth. We first estimate a depth map from a source image and then convert it into a synthetic flow field that preserves essential structural cues. This process enables the transformation of large-scale image-mask pairs into image-flow-mask training pairs, dramatically expanding the data available for network training. By training a simple encoder-decoder architecture with our synthesized data, we achieve new state-of-the-art performance on all public VOS benchmarks, demonstrating a scalable and effective solution to the data scarcity problem.
Related papers
- Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images [23.731451842621933]
We develop a large-scale data generation framework designed to learn optical flow estimation from any single-view images in the real world.<n>For the first time, we demonstrate the benefits of generating optical flow training data from large-scale real-world images.<n>Our models serve as a foundation model and enhance the performance of various downstream video tasks.
arXiv Detail & Related papers (2025-06-09T13:23:44Z) - Improving Unsupervised Video Object Segmentation via Fake Flow Generation [20.89278343723177]
We propose a novel data generation method that simulates fake optical flows from single images.
Inspired by the observation that optical flow maps are highly dependent on depth maps, we generate fake optical flows by refining and augmenting the estimated depth maps of each image.
arXiv Detail & Related papers (2024-07-16T13:32:50Z) - Moving Object Proposals with Deep Learned Optical Flow for Video Object
Segmentation [1.551271936792451]
We propose a state of art architecture of neural networks to get the moving object proposals (MOP)
We first train an unsupervised convolutional neural network (UnFlow) to generate optical flow estimation.
Then we render the output of optical flow net to a fully convolutional SegNet model.
arXiv Detail & Related papers (2024-02-14T01:13:55Z) - Hierarchical Graph Pattern Understanding for Zero-Shot VOS [102.21052200245457]
This paper proposes a new hierarchical graph neural network (GNN) architecture for zero-shot video object segmentation (ZS-VOS)
Inspired by the strong ability of GNNs in capturing structural relations, HGPU innovatively leverages motion cues (ie, optical flow) to enhance the high-order representations from the neighbors of target frames.
arXiv Detail & Related papers (2023-12-15T04:13:21Z) - RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos [28.995525297929348]
RealFlow is a framework that can create large-scale optical flow datasets directly from unlabeled realistic videos.
We first estimate optical flow between a pair of video frames, and then synthesize a new image from this pair based on the predicted flow.
Our approach achieves state-of-the-art performance on two standard benchmarks compared with both supervised and unsupervised optical flow methods.
arXiv Detail & Related papers (2022-07-22T13:33:03Z) - USegScene: Unsupervised Learning of Depth, Optical Flow and Ego-Motion
with Semantic Guidance and Coupled Networks [31.600708674008384]
USegScene is a framework for semantically guided unsupervised learning of depth, optical flow and ego-motion estimation for stereo camera images.
We present results on the popular KITTI dataset and show that our approach outperforms other methods by a large margin.
arXiv Detail & Related papers (2022-07-15T13:25:47Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - Deep Direct Volume Rendering: Learning Visual Feature Mappings From
Exemplary Images [57.253447453301796]
We introduce Deep Direct Volume Rendering (DeepDVR), a generalization of Direct Volume Rendering (DVR) that allows for the integration of deep neural networks into the DVR algorithm.
We conceptualize the rendering in a latent color space, thus enabling the use of deep architectures to learn implicit mappings for feature extraction and classification.
Our generalization serves to derive novel volume rendering architectures that can be trained end-to-end directly from examples in image space.
arXiv Detail & Related papers (2021-06-09T23:03:00Z) - Learning optical flow from still images [53.295332513139925]
We introduce a framework to generate accurate ground-truth optical flow annotations quickly and in large amounts from any readily available single real picture.
We virtually move the camera in the reconstructed environment with known motion vectors and rotation angles.
When trained with our data, state-of-the-art optical flow networks achieve superior generalization to unseen real data.
arXiv Detail & Related papers (2021-04-08T17:59:58Z) - Feature Flow: In-network Feature Flow Estimation for Video Object
Detection [56.80974623192569]
Optical flow is widely used in computer vision tasks to provide pixel-level motion information.
A common approach is to:forward optical flow to a neural network and fine-tune this network on the task dataset.
We propose a novel network (IFF-Net) with an textbfIn-network textbfFeature textbfFlow estimation module for video object detection.
arXiv Detail & Related papers (2020-09-21T07:55:50Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.