VEnvision3D: A Synthetic Perception Dataset for 3D Multi-Task Model
Research
- URL: http://arxiv.org/abs/2402.19059v2
- Date: Tue, 5 Mar 2024 07:18:18 GMT
- Title: VEnvision3D: A Synthetic Perception Dataset for 3D Multi-Task Model
Research
- Authors: Jiahao Zhou, Chen Long, Yue Xie, Jialiang Wang, Boheng Li, Haiping
Wang, Zhe Chen, Zhen Dong
- Abstract summary: VEnvision3D is a large 3D synthetic perception dataset for multi-task learning.
Sub-tasks are inherently aligned in terms of the utilized data.
Our dataset and code will be open-sourced upon acceptance.
- Score: 10.764333144509571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing a unified multi-task foundation model has become a critical
challenge in computer vision research. In the current field of 3D computer
vision, most datasets only focus on single task, which complicates the
concurrent training requirements of various downstream tasks. In this paper, we
introduce VEnvision3D, a large 3D synthetic perception dataset for multi-task
learning, including depth completion, segmentation, upsampling, place
recognition, and 3D reconstruction. Since the data for each task is collected
in the same environmental domain, sub-tasks are inherently aligned in terms of
the utilized data. Therefore, such a unique attribute can assist in exploring
the potential for the multi-task model and even the foundation model without
separate training methods. Meanwhile, capitalizing on the advantage of virtual
environments being freely editable, we implement some novel settings such as
simulating temporal changes in the environment and sampling point clouds on
model surfaces. These characteristics enable us to present several new
benchmarks. We also perform extensive studies on multi-task end-to-end models,
revealing new observations, challenges, and opportunities for future research.
Our dataset and code will be open-sourced upon acceptance.
Related papers
- LaVin-DiT: Large Vision Diffusion Transformer [99.98106406059333]
LaVin-DiT is a scalable and unified foundation model designed to tackle over 20 computer vision tasks in a generative framework.
We introduce key innovations to optimize generative performance for vision tasks.
The model is scaled from 0.1B to 3.4B parameters, demonstrating substantial scalability and state-of-the-art performance across diverse vision tasks.
arXiv Detail & Related papers (2024-11-18T12:05:27Z) - A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset [44.94304541427113]
We propose a multitask deep learning model to perform multiple classification and regression tasks simultaneously on hyperspectral images.
We validated our approach on a large hyperspectral dataset called TAIGA.
A comprehensive qualitative and quantitative analysis of the results shows that the proposed method significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-23T11:14:54Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation [57.40024206484446]
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models.
BVS supports a large number of adjustable parameters at the scene level.
We showcase three example application scenarios.
arXiv Detail & Related papers (2024-05-15T17:57:56Z) - Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training [44.790636524264]
Point Prompt Training is a novel framework for multi-dataset synergistic learning in the context of 3D representation learning.
It can overcome the negative transfer associated with synergistic learning and produce generalizable representations.
It achieves state-of-the-art performance on each dataset using a single weight-shared model with supervised multi-dataset training.
arXiv Detail & Related papers (2023-08-18T17:59:57Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection,
Segmentation, and Depth Estimation [11.608682595506354]
TaskPrompter presents an innovative multi-task prompting framework.
It unifies the learning of (i) task-generic representations, (ii) task-specific representations, and (iii) cross-task interactions.
New benchmark requires the multi-task model to concurrently generate predictions for monocular 3D vehicle detection, semantic segmentation, and monocular depth estimation.
arXiv Detail & Related papers (2023-04-03T13:41:35Z) - Multi-task learning from fixed-wing UAV images for 2D/3D city modeling [0.0]
Multi-task learning is an approach to scene understanding which involves multiple related tasks each with potentially limited training data.
In urban management applications such as infrastructure development, traffic monitoring, smart 3D cities, and change detection, automated multi-task data analysis is required.
In this study, a common framework for the performance assessment of multi-task learning methods from fixed-wing UAV images for 2D/3D city modeling is presented.
arXiv Detail & Related papers (2021-08-25T14:45:42Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.