TartanVO: A Generalizable Learning-based VO
- URL: http://arxiv.org/abs/2011.00359v1
- Date: Sat, 31 Oct 2020 20:49:33 GMT
- Title: TartanVO: A Generalizable Learning-based VO
- Authors: Wenshan Wang, Yaoyu Hu, Sebastian Scherer
- Abstract summary: We present the first learning-based visual odometry (VO) model, which generalizes to multiple datasets and real-world scenarios.
We achieve this by leveraging the SLAM dataset TartanAir, which provides a large amount of diverse synthetic data in challenging environments.
- Score: 9.12375405509935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the first learning-based visual odometry (VO) model, which
generalizes to multiple datasets and real-world scenarios and outperforms
geometry-based methods in challenging scenes. We achieve this by leveraging the
SLAM dataset TartanAir, which provides a large amount of diverse synthetic data
in challenging environments. Furthermore, to make our VO model generalize
across datasets, we propose an up-to-scale loss function and incorporate the
camera intrinsic parameters into the model. Experiments show that a single
model, TartanVO, trained only on synthetic data, without any finetuning, can be
generalized to real-world datasets such as KITTI and EuRoC, demonstrating
significant advantages over the geometry-based methods on challenging
trajectories. Our code is available at https://github.com/castacks/tartanvo.
Related papers
- UdeerLID+: Integrating LiDAR, Image, and Relative Depth with Semi-Supervised [12.440461420762265]
Road segmentation is a critical task for autonomous driving systems.
Our work introduces an innovative approach that integrates LiDAR point cloud data, visual image, and relative depth maps.
One of the primary challenges is the scarcity of large-scale, accurately labeled datasets.
arXiv Detail & Related papers (2024-09-10T03:57:30Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Generalizable Implicit Neural Representation As a Universal Spatiotemporal Traffic Data Learner [46.866240648471894]
Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system.
We present a novel paradigm to address the STTD learning problem by parameterizing STTD as an implicit neural representation.
We validate its effectiveness through extensive experiments in real-world scenarios, showcasing applications from corridor to network scales.
arXiv Detail & Related papers (2024-06-13T02:03:22Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - The Devil in the Details: Simple and Effective Optical Flow Synthetic
Data Generation [19.945859289278534]
We show that the required characteristics in an optical flow dataset are rather simple and present a simpler synthetic data generation method.
With 2D motion-based datasets, we systematically analyze the simplest yet critical factors for generating synthetic datasets.
arXiv Detail & Related papers (2023-08-14T18:01:45Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - Domain Generalization via Ensemble Stacking for Face Presentation Attack
Detection [4.61143637299349]
Face Presentation Attack Detection (PAD) plays a pivotal role in securing face recognition systems against spoofing attacks.
This work proposes a comprehensive solution that combines synthetic data generation and deep ensemble learning.
Experimental results on four datasets demonstrate low half total error rates (HTERs) on three benchmark datasets.
arXiv Detail & Related papers (2023-01-05T16:44:36Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic
Data [2.554905387213586]
We present a visual localization system that learns to estimate camera poses in the real world with the help of synthetic data.
To mitigate the data scarcity issue, we introduce TOPO-DataGen, a versatile synthetic data generation tool.
We also introduce CrossLoc, a cross-modal visual representation learning approach to pose estimation.
arXiv Detail & Related papers (2021-12-16T18:05:48Z) - Deep transfer learning for improving single-EEG arousal detection [63.52264764099532]
Two datasets do not contain exactly the same setup leading to degraded performance in single-EEG models.
We train a baseline model and replace the first two layers to prepare the architecture for single-channel electroencephalography data.
Using a fine-tuning strategy, our model yields similar performance to the baseline model and was significantly better than a comparable single-channel model.
arXiv Detail & Related papers (2020-04-10T16:51:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.