AI-Generated Content (AIGC) for Various Data Modalities: A Survey
- URL: http://arxiv.org/abs/2308.14177v4
- Date: Sat, 21 Oct 2023 15:45:04 GMT
- Title: AI-Generated Content (AIGC) for Various Data Modalities: A Survey
- Authors: Lin Geng Foo, Hossein Rahmani, Jun Liu
- Abstract summary: AIGC methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms.
We provide a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods.
- Score: 17.787268628612765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI-generated content (AIGC) methods aim to produce text, images, videos, 3D
assets, and other media using AI algorithms. Due to its wide range of
applications and the demonstrated potential of recent works, AIGC developments
have been attracting lots of attention recently, and AIGC methods have been
developed for various data modalities, such as image, video, text, 3D shape (as
voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human
avatar (body and head), 3D motion, and audio -- each presenting different
characteristics and challenges. Furthermore, there have also been many
significant developments in cross-modality AIGC methods, where generative
methods can receive conditioning input in one modality and produce outputs in
another. Examples include going from various modalities to image, video, 3D
shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar),
and audio modalities. In this paper, we provide a comprehensive review of AIGC
methods across different data modalities, including both single-modality and
cross-modality methods, highlighting the various challenges, representative
works, and recent technical directions in each setting. We also survey the
representative datasets throughout the modalities, and present comparative
results for various modalities. Moreover, we also discuss the challenges and
potential future research directions.
Related papers
- 3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities [57.444435654131006]
3D Gaussian Splatting (3DGS) has emerged as a prominent technique with the potential to become a mainstream method for 3D representations.
This survey aims to analyze existing 3DGS-related works from multiple intersecting perspectives.
arXiv Detail & Related papers (2024-07-24T16:53:17Z) - Markerless Multi-view 3D Human Pose Estimation: a survey [0.49157446832511503]
3D human pose estimation aims to reconstruct the human skeleton of all the individuals in a scene by detecting several body joints.
No method is yet capable of solving all the challenges associated with the reconstruction of the 3D pose.
Further research is still required to develop an approach capable of quickly inferring a highly accurate 3D pose with bearable computation cost.
arXiv Detail & Related papers (2024-07-04T10:44:35Z) - OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation [67.56268991234371]
OV-Uni3DETR achieves the state-of-the-art performance on various scenarios, surpassing existing methods by more than 6% on average.
Code and pre-trained models will be released later.
arXiv Detail & Related papers (2024-03-28T17:05:04Z) - VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding [47.58359136198136]
VisionGPT-3D provides a versatile multimodal framework building upon the strengths of multimodal foundation models.
It seamlessly integrates various SOTA vision models and brings the automation in the selection of SOTA vision models.
It identifies the suitable 3D mesh creation algorithms corresponding to 2D depth maps analysis, generates optimal results based on diverse multimodal inputs.
arXiv Detail & Related papers (2024-03-14T16:13:00Z) - A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
Objects in 3D Scenes [80.20670062509723]
3D dense captioning is an emerging vision-language bridging task that aims to generate detailed descriptions for 3D scenes.
It presents significant potential and challenges due to its closer representation of the real world compared to 2D visual captioning.
Despite the popularity and success of existing methods, there is a lack of comprehensive surveys summarizing the advancements in this field.
arXiv Detail & Related papers (2024-03-12T10:04:08Z) - A Comprehensive Survey on 3D Content Generation [148.434661725242]
3D content generation shows both academic and practical values.
New taxonomy is proposed that categorizes existing approaches into three types: 3D native generative methods, 2D prior-based 3D generative methods, and hybrid 3D generative methods.
arXiv Detail & Related papers (2024-02-02T06:20:44Z) - Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive
Survey and Evaluation [28.417029383793068]
Multi-modal 3D scene understanding has gained considerable attention due to its wide applications in many areas, such as autonomous driving and human-computer interaction.
introducing an additional modality not only elevates the richness and precision of scene interpretation but also ensures a more robust and resilient understanding.
We present a novel taxonomy that delivers a thorough categorization of existing methods according to modalities and tasks, exploring their respective strengths and limitations.
arXiv Detail & Related papers (2023-10-24T09:39:05Z) - UniG3D: A Unified 3D Object Generation Dataset [75.49544172927749]
UniG3D is a unified 3D object generation dataset constructed by employing a universal data transformation pipeline on ShapeNet datasets.
This pipeline converts each raw 3D model into comprehensive multi-modal data representation.
The selection of data sources for our dataset is based on their scale and quality.
arXiv Detail & Related papers (2023-06-19T07:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.