If generative AI is the answer, what is the question?
- URL: http://arxiv.org/abs/2509.06120v1
- Date: Sun, 07 Sep 2025 16:07:45 GMT
- Title: If generative AI is the answer, what is the question?
- Authors: Ambuj Tewari,
- Abstract summary: Generation as a machine learning task with connections to prediction, compression, and decision-making.<n>We survey five major generative model families: autoregressive models, variational autoencoders, normalizing flows, generative adversarial networks, and diffusion models.<n>We adopt a task-first framing of generation, focusing on what generation is as a machine learning problem, rather than only on how models implement it.
- Score: 28.34285630606338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Beginning with text and images, generative AI has expanded to audio, video, computer code, and molecules. Yet, if generative AI is the answer, what is the question? We explore the foundations of generation as a distinct machine learning task with connections to prediction, compression, and decision-making. We survey five major generative model families: autoregressive models, variational autoencoders, normalizing flows, generative adversarial networks, and diffusion models. We then introduce a probabilistic framework that emphasizes the distinction between density estimation and generation. We review a game-theoretic framework with a two-player adversary-learner setup to study generation. We discuss post-training modifications that prepare generative models for deployment. We end by highlighting some important topics in socially responsible generation such as privacy, detection of AI-generated content, and copyright and IP. We adopt a task-first framing of generation, focusing on what generation is as a machine learning problem, rather than only on how models implement it.
Related papers
- Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models [21.9391057771634]
We propose a framework to address the potential conflict between generation and understanding in a multimodal model.<n>By explicitly leveraging the model's understanding capability during generation, we successfully mitigate the optimization dilemma.<n>This offers valuable insights for designing next-generation unified multimodal models.
arXiv Detail & Related papers (2026-02-17T18:04:13Z) - BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation [77.55074597806035]
GenBuster-200K is a large-scale, high-quality AI-generated video dataset featuring 200K high-resolution video clips.<n>BusterX is a novel AI-generated video detection and explanation framework leveraging multimodal large language model (MLLM) and reinforcement learning.
arXiv Detail & Related papers (2025-05-19T02:06:43Z) - Scalable Framework for Classifying AI-Generated Content Across Modalities [0.0]
This paper presents a scalable framework that integrates perceptual hashing, similarity measurement, and pseudo-labeling.<n> Comprehensive evaluations on the Defactify4 dataset demonstrate competitive performance in text and image classification tasks.<n>These results highlight the framework's potential for real-world applications as generative AI continues to evolve.
arXiv Detail & Related papers (2025-02-01T09:28:40Z) - Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations [52.11801730860999]
In recent years, the robot learning community has shown increasing interest in using deep generative models to capture the complexity of large datasets.
We present the different types of models that the community has explored, such as energy-based models, diffusion models, action value maps, or generative adversarial networks.
We also present the different types of applications in which deep generative models have been used, from grasp generation to trajectory generation or cost learning.
arXiv Detail & Related papers (2024-08-08T11:34:31Z) - On the Limitations and Prospects of Machine Unlearning for Generative AI [7.795648142175443]
Generative AI (GenAI) aims to synthesize realistic and diverse data samples from latent variables or other data modalities.
GenAI has achieved remarkable results in various domains, such as natural language, images, audio, and graphs.
However, they also pose challenges and risks to data privacy, security, and ethics.
arXiv Detail & Related papers (2024-08-01T08:35:40Z) - Generative Multi-modal Models are Good Class-Incremental Learners [51.5648732517187]
We propose a novel generative multi-modal model (GMM) framework for class-incremental learning.
Our approach directly generates labels for images using an adapted generative model.
Under the Few-shot CIL setting, we have improved by at least 14% accuracy over all the current state-of-the-art methods with significantly less forgetting.
arXiv Detail & Related papers (2024-03-27T09:21:07Z) - AI for the Generation and Testing of Ideas Towards an AI Supported
Knowledge Development Environment [2.0305676256390934]
We discuss how generative AI can boost idea generation by eliminating human bias.
We also describe how search can verify facts, logic, and context.
This paper introduces a system for knowledge workers, Generate And Search Test, enabling individuals to efficiently create solutions.
arXiv Detail & Related papers (2023-07-17T22:17:40Z) - A Comprehensive Survey of AI-Generated Content (AIGC): A History of
Generative AI from GAN to ChatGPT [63.58711128819828]
ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC)
The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace.
arXiv Detail & Related papers (2023-03-07T20:36:13Z) - Conditional Generation with a Question-Answering Blueprint [84.95981645040281]
We advocate planning as a useful intermediate representation for rendering conditional generation less opaque and more grounded.
We obtain blueprints automatically by exploiting state-of-the-art question generation technology.
We develop Transformer-based models, each varying in how they incorporate the blueprint in the generated output.
arXiv Detail & Related papers (2022-07-01T13:10:19Z) - Video Generation from Text Employing Latent Path Construction for
Temporal Modeling [70.06508219998778]
Video generation is one of the most challenging tasks in Machine Learning and Computer Vision fields of study.
In this paper, we tackle the text to video generation problem, which is a conditional form of video generation.
We believe that video generation from natural language sentences will have an important impact on Artificial Intelligence.
arXiv Detail & Related papers (2021-07-29T06:28:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.