Related papers: AI-Generated Content (AIGC) for Various Data Modalities: A Survey

AI-Generated Content (AIGC) for Various Data Modalities: A Survey

URL: http://arxiv.org/abs/2308.14177v4
Date: Sat, 21 Oct 2023 15:45:04 GMT
Title: AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Authors: Lin Geng Foo, Hossein Rahmani, Jun Liu
Abstract summary: AIGC methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. We provide a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods.
Score: 17.787268628612765
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the demonstrated potential of recent works, AIGC developments have been attracting lots of attention recently, and AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape (as voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human avatar (body and head), 3D motion, and audio -- each presenting different characteristics and challenges. Furthermore, there have also been many significant developments in cross-modality AIGC methods, where generative methods can receive conditioning input in one modality and produce outputs in another. Examples include going from various modalities to image, video, 3D shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar), and audio modalities. In this paper, we provide a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods, highlighting the various challenges, representative works, and recent technical directions in each setting. We also survey the representative datasets throughout the modalities, and present comparative results for various modalities. Moreover, we also discuss the challenges and potential future research directions.

Related papers

How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM [39.65493154187172]
Large Language Models (LLMs) have been leveraged to enhance 3D understanding tasks, showing potential to surpass traditional computer vision methods. We propose a taxonomy that categorizes existing methods into three branches: image-based methods deriving 3D understanding from 2D visual data, point cloud-based methods working directly with 3D representations, and hybrid modality-based methods combining multiple data streams.
arXiv Detail & Related papers (2025-04-08T08:11:39Z)
The Evolution and Future Perspectives of Artificial Intelligence Generated Content [7.586328912947784]
Review traces AIGC's evolution through four developmental milestones. This study aims to guide researchers and practitioners in selecting and optimizing AIGC models.
arXiv Detail & Related papers (2024-12-02T20:16:40Z)
Generative Artificial Intelligence Meets Synthetic Aperture Radar: A Survey [49.29751866761522]
This paper aims to investigate the intersection of GenAI and SAR. First, we illustrate the common data generation-based applications in SAR field. Then, an overview of the latest GenAI models is systematically reviewed. Finally, the corresponding applications in SAR domain are also included.
arXiv Detail & Related papers (2024-11-05T03:06:00Z)
A Comprehensive Methodological Survey of Human Activity Recognition Across Divers Data Modalities [2.916558661202724]
Human Activity Recognition (HAR) systems aim to understand human behaviour and assign a label to each action. HAR can leverage various data modalities, such as RGB images and video, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, and radar signals. This paper presents a comprehensive survey of the latest advancements in HAR from 2014 to 2024.
arXiv Detail & Related papers (2024-09-15T10:04:44Z)
3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities [57.444435654131006]
3D Gaussian Splatting (3DGS) has emerged as a prominent technique with the potential to become a mainstream method for 3D representations. This survey aims to analyze existing 3DGS-related works from multiple intersecting perspectives.
arXiv Detail & Related papers (2024-07-24T16:53:17Z)
Markerless Multi-view 3D Human Pose Estimation: a survey [0.49157446832511503]
3D human pose estimation aims to reconstruct the human skeleton of all the individuals in a scene by detecting several body joints. No method is yet capable of solving all the challenges associated with the reconstruction of the 3D pose. Further research is still required to develop an approach capable of quickly inferring a highly accurate 3D pose with bearable computation cost.
arXiv Detail & Related papers (2024-07-04T10:44:35Z)
OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation [67.56268991234371]
OV-Uni3DETR achieves the state-of-the-art performance on various scenarios, surpassing existing methods by more than 6% on average. Code and pre-trained models will be released later.
arXiv Detail & Related papers (2024-03-28T17:05:04Z)
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding [47.58359136198136]
VisionGPT-3D provides a versatile multimodal framework building upon the strengths of multimodal foundation models. It seamlessly integrates various SOTA vision models and brings the automation in the selection of SOTA vision models. It identifies the suitable 3D mesh creation algorithms corresponding to 2D depth maps analysis, generates optimal results based on diverse multimodal inputs.
arXiv Detail & Related papers (2024-03-14T16:13:00Z)
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes [80.20670062509723]
3D dense captioning is an emerging vision-language bridging task that aims to generate detailed descriptions for 3D scenes. It presents significant potential and challenges due to its closer representation of the real world compared to 2D visual captioning. Despite the popularity and success of existing methods, there is a lack of comprehensive surveys summarizing the advancements in this field.
arXiv Detail & Related papers (2024-03-12T10:04:08Z)
A Comprehensive Survey on 3D Content Generation [148.434661725242]
3D content generation shows both academic and practical values. New taxonomy is proposed that categorizes existing approaches into three types: 3D native generative methods, 2D prior-based 3D generative methods, and hybrid 3D generative methods.
arXiv Detail & Related papers (2024-02-02T06:20:44Z)
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation [28.417029383793068]
Multi-modal 3D scene understanding has gained considerable attention due to its wide applications in many areas, such as autonomous driving and human-computer interaction. introducing an additional modality not only elevates the richness and precision of scene interpretation but also ensures a more robust and resilient understanding. We present a novel taxonomy that delivers a thorough categorization of existing methods according to modalities and tasks, exploring their respective strengths and limitations.
arXiv Detail & Related papers (2023-10-24T09:39:05Z)
Bridging MDE and AI: A Systematic Review of Domain-Specific Languages and Model-Driven Practices in AI Software Systems Engineering [1.4853133497896698]
This study aims to investigate the existing model-driven approaches relying on DSL in support of the engineering of AI software systems. The use of MDE for AI is still in its early stages, and there is no single tool or method that is widely used.
arXiv Detail & Related papers (2023-07-10T14:38:38Z)
UniG3D: A Unified 3D Object Generation Dataset [75.49544172927749]
UniG3D is a unified 3D object generation dataset constructed by employing a universal data transformation pipeline on ShapeNet datasets. This pipeline converts each raw 3D model into comprehensive multi-modal data representation. The selection of data sources for our dataset is based on their scale and quality.
arXiv Detail & Related papers (2023-06-19T07:03:45Z)
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT [63.58711128819828]
ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC) The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace.
arXiv Detail & Related papers (2023-03-07T20:36:13Z)
DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models. Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z)
Human Action Recognition from Various Data Modalities: A Review [37.07491839026713]
Human Action Recognition (HAR) aims to understand human behavior and assign a label to each action. HAR has a wide range of applications, and has been attracting increasing attention in the field of computer vision. We present a survey of recent progress in deep learning methods for HAR based on the type of input data modality.
arXiv Detail & Related papers (2020-12-22T07:37:43Z)
Recent Progress in Appearance-based Action Recognition [73.6405863243707]
Action recognition is a task to identify various human actions in a video. Recent appearance-based methods have achieved promising progress towards accurate action recognition.
arXiv Detail & Related papers (2020-11-25T10:18:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.