Unmaking AI Imagemaking: A Methodological Toolkit for Critical
Investigation
- URL: http://arxiv.org/abs/2307.09753v1
- Date: Wed, 19 Jul 2023 05:26:10 GMT
- Title: Unmaking AI Imagemaking: A Methodological Toolkit for Critical
Investigation
- Authors: Luke Munn, Liam Magee, Vanicka Arora
- Abstract summary: We provide three methodological approaches for investigating AI image models.
Unmaking the ecosystem analyzes the values, structures, and incentives surrounding the model's production.
Unmaking the output analyzes the model's generative results, revealing its logics.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: AI image models are rapidly evolving, disrupting aesthetic production in many
industries. However, understanding of their underlying archives, their logic of
image reproduction, and their persistent biases remains limited. What kind of
methods and approaches could open up these black boxes? In this paper, we
provide three methodological approaches for investigating AI image models and
apply them to Stable Diffusion as a case study. Unmaking the ecosystem analyzes
the values, structures, and incentives surrounding the model's production.
Unmaking the data analyzes the images and text the model draws upon, with their
attendant particularities and biases. Unmaking the output analyzes the model's
generative results, revealing its logics through prompting, reflection, and
iteration. Each mode of inquiry highlights particular ways in which the image
model captures, "understands," and recreates the world. This accessible
framework supports the work of critically investigating generative AI image
models and paves the way for more socially and politically attuned analyses of
their impacts in the world.
Related papers
- Draw an Ugly Person An Exploration of Generative AIs Perceptions of Ugliness [0.0]
Generative AI does not only replicate human creativity but also reproduces deep-seated cultural biases.<n>This study investigates how four different generative AI models understand and express ugliness through text and image.
arXiv Detail & Related papers (2025-07-16T13:16:56Z) - Thinking with Generated Images [30.28526622443551]
We present Thinking with Generated Images, a novel paradigm that transforms how large multimodal models (LMMs) engage with visual reasoning.<n>Our approach enables AI models to engage in the kind of visual imagination and iterative refinement that characterizes human creative, analytical, and strategic thinking.
arXiv Detail & Related papers (2025-05-28T16:12:45Z) - Exploring Bias in over 100 Text-to-Image Generative Models [49.60774626839712]
We investigate bias trends in text-to-image generative models over time, focusing on the increasing availability of models through open platforms like Hugging Face.
We assess bias across three key dimensions: (i) distribution bias, (ii) generative hallucination, and (iii) generative miss-rate.
Our findings indicate that artistic and style-transferred models exhibit significant bias, whereas foundation models, benefiting from broader training distributions, are becoming progressively less biased.
arXiv Detail & Related papers (2025-03-11T03:40:44Z) - A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends [67.43992456058541]
Image restoration (IR) refers to the process of improving visual quality of images while removing degradation, such as noise, blur, weather effects, and so on.
Traditional IR methods typically target specific types of degradation, which limits their effectiveness in real-world scenarios with complex distortions.
The all-in-one image restoration (AiOIR) paradigm has emerged, offering a unified framework that adeptly addresses multiple degradation types.
arXiv Detail & Related papers (2024-10-19T11:11:09Z) - GalleryGPT: Analyzing Paintings with Large Multimodal Models [64.98398357569765]
Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability.
Previous works for automatically analyzing artworks mainly focus on classification, retrieval, and other simple tasks, which is far from the goal of AI.
We introduce a superior large multimodal model for painting analysis composing, dubbed GalleryGPT, which is slightly modified and fine-tuned based on LLaVA architecture.
arXiv Detail & Related papers (2024-08-01T11:52:56Z) - DiffusionPID: Interpreting Diffusion via Partial Information Decomposition [24.83767778658948]
We apply information-theoretic principles to decompose the input text prompt into its elementary components.
We analyze how individual tokens and their interactions shape the generated image.
We show that PID is a potent tool for evaluating and diagnosing text-to-image diffusion models.
arXiv Detail & Related papers (2024-06-07T18:17:17Z) - ASAP: Interpretable Analysis and Summarization of AI-generated Image Patterns at Scale [20.12991230544801]
Generative image models have emerged as a promising technology to produce realistic images.
There is growing demand to empower users to effectively discern and comprehend patterns of AI-generated images.
We develop ASAP, an interactive visualization system that automatically extracts distinct patterns of AI-generated images.
arXiv Detail & Related papers (2024-04-03T18:20:41Z) - Generative AI in Vision: A Survey on Models, Metrics and Applications [0.0]
Generative AI models have revolutionized various fields by enabling the creation of realistic and diverse data samples.
Among these models, diffusion models have emerged as a powerful approach for generating high-quality images, text, and audio.
This survey paper provides a comprehensive overview of generative AI diffusion and legacy models, focusing on their underlying techniques, applications across different domains, and their challenges.
arXiv Detail & Related papers (2024-02-26T07:47:12Z) - Diffusion Models for Image Restoration and Enhancement -- A
Comprehensive Survey [96.99328714941657]
We present a comprehensive review of recent diffusion model-based methods on image restoration.
We classify and emphasize the innovative designs using diffusion models for both IR and blind/real-world IR.
We propose five potential and challenging directions for the future research of diffusion model-based IR.
arXiv Detail & Related papers (2023-08-18T08:40:38Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Morphological Image Analysis and Feature Extraction for Reasoning with
AI-based Defect Detection and Classification Models [10.498224499451991]
This paper proposes the AI-Reasoner, which extracts morphological characteristics of defects (DefChars) from images.
The AI-Reasoner exports visualisations (i.e. charts) and textual explanations to provide insights into outputs made by masked-based defect detection and classification models.
It also provides effective mitigation strategies to enhance data pre-processing and overall model performance.
arXiv Detail & Related papers (2023-07-21T15:22:32Z) - Beyond Explaining: Opportunities and Challenges of XAI-Based Model
Improvement [75.00655434905417]
Explainable Artificial Intelligence (XAI) is an emerging research field bringing transparency to highly complex machine learning (ML) models.
This paper offers a comprehensive overview over techniques that apply XAI practically for improving various properties of ML models.
We show empirically through experiments on toy and realistic settings how explanations can help improve properties such as model generalization ability or reasoning.
arXiv Detail & Related papers (2022-03-15T15:44:28Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.