THE Benchmark: Transferable Representation Learning for Monocular Height
Estimation
- URL: http://arxiv.org/abs/2112.14985v2
- Date: Thu, 21 Sep 2023 14:32:17 GMT
- Title: THE Benchmark: Transferable Representation Learning for Monocular Height
Estimation
- Authors: Zhitong Xiong, Wei Huang, Jingtao Hu, and Xiao Xiang Zhu
- Abstract summary: We propose a new benchmark dataset to study the transferability of height estimation models in a cross-dataset setting.
This benchmark dataset includes a newly proposed large-scale synthetic dataset, a newly collected real-world dataset, and four existing datasets from different cities.
In this paper, we propose a scale-deformable convolution module to enhance the window-based Transformer for handling the scale-variation problem in the height estimation task.
- Score: 25.872962101146115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating 3D city models rapidly is crucial for many applications. Monocular
height estimation is one of the most efficient and timely ways to obtain
large-scale geometric information. However, existing works focus primarily on
training and testing models using unbiased datasets, which does not align well
with real-world applications. Therefore, we propose a new benchmark dataset to
study the transferability of height estimation models in a cross-dataset
setting. To this end, we first design and construct a large-scale benchmark
dataset for cross-dataset transfer learning on the height estimation task. This
benchmark dataset includes a newly proposed large-scale synthetic dataset, a
newly collected real-world dataset, and four existing datasets from different
cities. Next, a new experimental protocol, few-shot cross-dataset transfer, is
designed. Furthermore, in this paper, we propose a scale-deformable convolution
module to enhance the window-based Transformer for handling the scale-variation
problem in the height estimation task. Experimental results have demonstrated
the effectiveness of the proposed methods in the traditional and cross-dataset
transfer settings. The datasets and codes are publicly available at
https://mediatum.ub.tum.de/1662763 and https://thebenchmarkh.github.io/.
Related papers
- Scaling Up Diffusion and Flow-based XGBoost Models [5.944645679491607]
We investigate a recent proposal to use XGBoost as the function approximator in diffusion and flow-matching models.
With better implementation it can be scaled to datasets 370x larger than previously used.
We present results on large-scale scientific datasets as part of the Fast Calorimeter Simulation Challenge.
arXiv Detail & Related papers (2024-08-28T18:00:00Z) - UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction [93.77809355002591]
We introduce UniTraj, a comprehensive framework that unifies various datasets, models, and evaluation criteria.
We conduct extensive experiments and find that model performance significantly drops when transferred to other datasets.
We provide insights into dataset characteristics to explain these findings.
arXiv Detail & Related papers (2024-03-22T10:36:50Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning.
However, the promising results achieved on current public datasets may not be applicable to practical scenarios.
We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z) - Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection [34.2238222373818]
Current 3D object detection models follow a single dataset-specific training and testing paradigm.
In this paper, we study the task of training a unified 3D detector from multiple datasets.
We present a Uni3D which leverages a simple data-level correction operation and a designed semantic-level coupling-and-recoupling module.
arXiv Detail & Related papers (2023-03-13T05:54:13Z) - Primitive3D: 3D Object Dataset Synthesis from Randomly Assembled
Primitives [44.03149443379618]
We propose a cost-effective method for automatically generating a large amount of 3D objects with annotations.
These objects are auto-annotated with part labels originating from primitives.
Considering the large overhead of learning on the generated dataset, we propose a dataset distillation strategy.
arXiv Detail & Related papers (2022-05-25T10:07:07Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z) - Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision
Datasets from 3D Scans [103.92680099373567]
This paper introduces a pipeline to parametrically sample and render multi-task vision datasets from comprehensive 3D scans from the real world.
Changing the sampling parameters allows one to "steer" the generated datasets to emphasize specific information.
Common architectures trained on a generated starter dataset reached state-of-the-art performance on multiple common vision tasks and benchmarks.
arXiv Detail & Related papers (2021-10-11T04:21:46Z) - A Method to Generate High Precision Mesh Model and RGB-D Datasetfor 6D
Pose Estimation Task [10.24919213221012]
We propose a new method for object reconstruction, which takes into account the speed, accuracy and robustness.
Our data is more close to the rendering data, which shrinking the gap between the real data and synthetic data further.
arXiv Detail & Related papers (2020-11-17T16:56:57Z) - 2nd Place Scheme on Action Recognition Track of ECCV 2020 VIPriors
Challenges: An Efficient Optical Flow Stream Guided Framework [57.847010327319964]
We propose a data-efficient framework that can train the model from scratch on small datasets.
Specifically, by introducing a 3D central difference convolution operation, we proposed a novel C3D neural network-based two-stream framework.
It is proved that our method can achieve a promising result even without a pre-trained model on large scale datasets.
arXiv Detail & Related papers (2020-08-10T09:50:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.