Self-Assessed Generation: Trustworthy Label Generation for Optical Flow and Stereo Matching in Real-world
- URL: http://arxiv.org/abs/2410.10453v1
- Date: Mon, 14 Oct 2024 12:46:17 GMT
- Title: Self-Assessed Generation: Trustworthy Label Generation for Optical Flow and Stereo Matching in Real-world
- Authors: Han Ling, Yinghui Sun, Quansen Sun, Ivor Tsang, Yuhui Zheng,
- Abstract summary: We propose a unified self-supervised generalization framework for optical flow and stereo tasks: Self-Assessed Generation (SAG).
Unlike previous self-supervised methods, SAG is data-driven, using advanced reconstruction techniques to construct a reconstruction field from RGB images and generate datasets based on it.
- Score: 24.251352190100135
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: A significant challenge facing current optical flow and stereo methods is the difficulty in generalizing them well to the real world. This is mainly due to the high costs required to produce datasets, and the limitations of existing self-supervised methods on fuzzy results and complex model training problems. To address the above challenges, we propose a unified self-supervised generalization framework for optical flow and stereo tasks: Self-Assessed Generation (SAG). Unlike previous self-supervised methods, SAG is data-driven, using advanced reconstruction techniques to construct a reconstruction field from RGB images and generate datasets based on it. Afterward, we quantified the confidence level of the generated results from multiple perspectives, such as reconstruction field distribution, geometric consistency, and structural similarity, to eliminate inevitable defects in the generation process. We also designed a 3D flight foreground automatic rendering pipeline in SAG to encourage the network to learn occlusion and motion foreground. Experimentally, because SAG does not involve changes to methods or loss functions, it can directly self-supervised train the state-of-the-art deep networks, greatly improving the generalization performance of self-supervised methods on current mainstream optical flow and stereo-matching datasets. Compared to previous training modes, SAG is more generalized, cost-effective, and accurate.
Related papers
- Context Enhancement with Reconstruction as Sequence for Unified Unsupervised Anomaly Detection [68.74469657656822]
Unsupervised anomaly detection (AD) aims to train robust detection models using only normal samples.
Recent research focuses on a unified unsupervised AD setting in which only one model is trained for all classes.
We introduce a novel Reconstruction as Sequence (RAS) method, which enhances the contextual correspondence during feature reconstruction.
arXiv Detail & Related papers (2024-09-10T07:37:58Z) - MS$^3$D: A RG Flow-Based Regularization for GAN Training with Limited Data [16.574346252357653]
We propose a novel regularization method based on the idea of renormalization group (RG) in physics.
We show that our method can effectively enhance the performance and stability of GANs under limited data scenarios.
arXiv Detail & Related papers (2024-08-20T18:37:37Z) - Towards Realistic Data Generation for Real-World Super-Resolution [58.88039242455039]
RealDGen is an unsupervised learning data generation framework designed for real-world super-resolution.
We develop content and degradation extraction strategies, which are integrated into a novel content-degradation decoupled diffusion model.
Experiments demonstrate that RealDGen excels in generating large-scale, high-quality paired data that mirrors real-world degradations.
arXiv Detail & Related papers (2024-06-11T13:34:57Z) - SAID-NeRF: Segmentation-AIDed NeRF for Depth Completion of Transparent Objects [7.529049797077149]
Acquiring accurate depth information of transparent objects using off-the-shelf RGB-D cameras is a well-known challenge in Computer Vision and Robotics.
NeRFs are learning-free approaches and have demonstrated wide success in novel view synthesis and shape recovery.
Our proposed method-AID-NeRF shows significant performance on depth completion datasets for transparent objects and robotic grasping.
arXiv Detail & Related papers (2024-03-28T17:28:32Z) - BFRFormer: Transformer-based generator for Real-World Blind Face
Restoration [37.77996097891398]
We propose a Transformer-based blind face restoration method, named BFRFormer, to reconstruct images with more identity-preserved details in an end-to-end manner.
Our method outperforms state-of-the-art methods on a synthetic dataset and four real-world datasets.
arXiv Detail & Related papers (2024-02-29T02:31:54Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - GIFD: A Generative Gradient Inversion Method with Feature Domain
Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy.
Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge.
We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z) - Unsupervised Seismic Footprint Removal With Physical Prior Augmented
Deep Autoencoder [11.303407992331213]
This article proposes a footprint removal network (dubbed FR-Net) for the unsupervised suppression of acquired footprints.
The key to the FR-Net is to design a unidirectional total variation (UTV) model for footprint acquisition according to the intrinsically directional property of noise.
arXiv Detail & Related papers (2023-02-08T07:46:28Z) - Unsupervised Monocular Depth Learning with Integrated Intrinsics and
Spatio-Temporal Constraints [61.46323213702369]
This work presents an unsupervised learning framework that is able to predict at-scale depth maps and egomotion.
Our results demonstrate strong performance when compared to the current state-of-the-art on multiple sequences of the KITTI driving dataset.
arXiv Detail & Related papers (2020-11-02T22:26:58Z) - Auto-Rectify Network for Unsupervised Indoor Depth Estimation [119.82412041164372]
We establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth.
We propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning.
Our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset.
arXiv Detail & Related papers (2020-06-04T08:59:17Z) - Deep Non-Line-of-Sight Reconstruction [18.38481917675749]
In this paper, we employ convolutional feed-forward networks for solving the reconstruction problem efficiently.
We devise a tailored autoencoder architecture, trained end-to-end reconstruction maps transient images directly to a depth map representation.
We demonstrate that our feed-forward network, even though it is trained solely on synthetic data, generalizes to measured data from SPAD sensors and is able to obtain results that are competitive with model-based reconstruction methods.
arXiv Detail & Related papers (2020-01-24T16:05:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.