Procedural Dataset Generation for Zero-Shot Stereo Matching
- URL: http://arxiv.org/abs/2504.16930v1
- Date: Wed, 23 Apr 2025 17:59:33 GMT
- Title: Procedural Dataset Generation for Zero-Shot Stereo Matching
- Authors: David Yan, Alexander Raistrick, Jia Deng,
- Abstract summary: We develop a procedural generator specifically optimized for zero-shot stereo datasets.<n>We report the effects on zero-shot stereo matching performance using standard benchmarks.<n>We open source our system to enable further research on procedural stereo datasets.
- Score: 62.21867807221371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthetic datasets are a crucial ingredient for training stereo matching networks, but the question of what makes a stereo dataset effective remains largely unexplored. We investigate the design space of synthetic datasets by varying the parameters of a procedural dataset generator, and report the effects on zero-shot stereo matching performance using standard benchmarks. We collect the best settings to produce Infinigen-Stereo, a procedural generator specifically optimized for zero-shot stereo datasets. Models trained only on data from our system outperform robust baselines trained on a combination of existing synthetic datasets and have stronger zero-shot stereo matching performance than public checkpoints from prior works. We open source our system at https://github.com/princeton-vl/InfinigenStereo to enable further research on procedural stereo datasets.
Related papers
- FoundationStereo: Zero-Shot Stereo Matching [50.79202911274819]
FoundationStereo is a foundation model for stereo depth estimation.
We first construct a large-scale (1M stereo pairs) synthetic training dataset.
We then design a number of network architecture components to enhance scalability.
arXiv Detail & Related papers (2025-01-17T01:01:44Z) - Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable.
We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data.
Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z) - Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data [69.7174072745851]
We present Synthio, a novel approach for augmenting small-scale audio classification datasets with synthetic data.<n>To overcome the first challenge, we align the generations of the T2A model with the small-scale dataset using preference optimization.<n>To address the second challenge, we propose a novel caption generation technique that leverages the reasoning capabilities of Large Language Models.
arXiv Detail & Related papers (2024-10-02T22:05:36Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - TarGEN: Targeted Data Generation with Large Language Models [51.87504111286201]
TarGEN is a multi-step prompting strategy for generating high-quality synthetic datasets.
We augment TarGEN with a method known as self-correction empowering LLMs to rectify inaccurately labeled instances.
A comprehensive analysis of the synthetic dataset compared to the original dataset reveals similar or higher levels of dataset complexity and diversity.
arXiv Detail & Related papers (2023-10-27T03:32:17Z) - On the use of automatically generated synthetic image datasets for
benchmarking face recognition [2.0196229393131726]
Recent advances in Generative Adversarial Networks (GANs) provide a pathway to replace real datasets by synthetic datasets.
Recent advances in Generative Adversarial Networks (GANs) to synthesize realistic face images provide a pathway to replace real datasets by synthetic datasets.
benchmarking results on the synthetic dataset are a good substitution, often providing error rates and system ranking similar to the benchmarking on the real dataset.
arXiv Detail & Related papers (2021-06-08T09:54:02Z) - PVStereo: Pyramid Voting Module for End-to-End Self-Supervised Stereo
Matching [14.603116313499648]
We propose a robust and effective self-supervised stereo matching approach, consisting of a pyramid voting module (PVM) and a novel DCNN architecture, referred to as OptStereo.
Specifically, our OptStereo first builds multi-scale cost volumes, and then adopts a recurrent unit to iteratively update disparity estimations at high resolution.
We publish the HKUST-Drive dataset, a large-scale synthetic stereo dataset, collected under different illumination and weather conditions for research purposes.
arXiv Detail & Related papers (2021-03-12T05:27:14Z) - Semi-synthesis: A fast way to produce effective datasets for stereo
matching [16.602343511350252]
Close-to-real-scene texture rendering is a key factor to boost up stereo matching performance.
We propose semi-synthetic, an effective and fast way to synthesize large amount of data with close-to-real-scene texture.
With further fine-tuning on the real dataset, we also achieve SOTA performance on Middlebury and competitive results on KITTI and ETH3D datasets.
arXiv Detail & Related papers (2021-01-26T14:34:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.