Speaking images. A novel framework for the automated self-description of artworks
- URL: http://arxiv.org/abs/2506.05368v1
- Date: Wed, 28 May 2025 09:13:41 GMT
- Title: Speaking images. A novel framework for the automated self-description of artworks
- Authors: Valentine Bernasconi, Gustavo Marfia,
- Abstract summary: Recent breakthroughs in generative AI have opened the door to new research perspectives in the domain of art and cultural heritage.<n>We propose a new framework towards the production of self-explaining cultural artifacts using open-source large-language, face detection, text-to-speech and audio-to-animation models.
- Score: 0.6138671548064356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent breakthroughs in generative AI have opened the door to new research perspectives in the domain of art and cultural heritage, where a large number of artifacts have been digitized. There is a need for innovation to ease the access and highlight the content of digital collections. Such innovations develop into creative explorations of the digital image in relation to its malleability and contemporary interpretation, in confrontation to the original historical object. Based on the concept of the autonomous image, we propose a new framework towards the production of self-explaining cultural artifacts using open-source large-language, face detection, text-to-speech and audio-to-animation models. The goal is to start from a digitized artwork and to automatically assemble a short video of the latter where the main character animates to explain its content. The whole process questions cultural biases encapsulated in large-language models, the potential of digital images and deepfakes of artworks for educational purposes, along with concerns of the field of art history regarding such creative diversions.
Related papers
- Context-aware Multimodal AI Reveals Hidden Pathways in Five Centuries of Art Evolution [1.8435193934665342]
We use cutting-edge generative AI, specifically Stable Diffusion, to analyze 500 years of Western paintings.<n>Our findings reveal that contextual information differentiates between artistic periods, styles, and individual artists more successfully than formal elements.<n>Our generative experiment, infusing prospective contexts into historical artworks, successfully reproduces the evolutionary trajectory of artworks.
arXiv Detail & Related papers (2025-03-15T10:45:04Z) - Diffusion-Based Visual Art Creation: A Survey and New Perspectives [51.522935314070416]
This survey explores the emerging realm of diffusion-based visual art creation, examining its development from both artistic and technical perspectives.
Our findings reveal how artistic requirements are transformed into technical challenges and highlight the design and application of diffusion-based methods within visual art creation.
We aim to shed light on the mechanisms through which AI systems emulate and possibly, enhance human capacities in artistic perception and creativity.
arXiv Detail & Related papers (2024-08-22T04:49:50Z) - Equivalence: An analysis of artists' roles with Image Generative AI from Conceptual Art perspective through an interactive installation design practice [16.063735487844628]
This study explores how artists interact with advanced text-to-image Generative AI models.
To exemplify this framework, a case study titled "Equivalence" converts users' speech input into continuously evolving paintings.
This work aims to broaden our understanding of artists' roles and foster a deeper appreciation for the creative aspects inherent in artwork created with Image Generative AI.
arXiv Detail & Related papers (2024-04-29T02:45:23Z) - CreativeSynth: Cross-Art-Attention for Artistic Image Synthesis with Multimodal Diffusion [73.08710648258985]
Key painting attributes including layout, perspective, shape, and semantics often cannot be conveyed and expressed through style transfer.<n>Large-scale pretrained text-to-image generation models have demonstrated their capability to synthesize a vast amount of high-quality images.<n>Our main novel idea is to integrate multimodal semantic information as a synthesis guide into artworks, rather than transferring style to the real world.
arXiv Detail & Related papers (2024-01-25T10:42:09Z) - No Longer Trending on Artstation: Prompt Analysis of Generative AI Art [7.64671395172401]
We collect and analyse over 3 million prompts and the images they generate.
Our study shows that prompting focuses largely on surface aesthetics, reinforcing cultural norms, popular conventional representations and imagery.
arXiv Detail & Related papers (2024-01-24T08:03:13Z) - DreamCreature: Crafting Photorealistic Virtual Creatures from
Imagination [140.1641573781066]
We introduce a novel task, Virtual Creatures Generation: Given a set of unlabeled images of the target concepts, we aim to train a T2I model capable of creating new, hybrid concepts.
We propose a new method called DreamCreature, which identifies and extracts the underlying sub-concepts.
The T2I thus adapts to generate novel concepts with faithful structures and photorealistic appearance.
arXiv Detail & Related papers (2023-11-27T01:24:31Z) - State of the Art on Diffusion Models for Visual Computing [191.6168813012954]
This report introduces the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model.
We also give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing.
We discuss available datasets, metrics, open challenges, and social implications.
arXiv Detail & Related papers (2023-10-11T05:32:29Z) - There Is a Digital Art History [1.0878040851637998]
We revisit Johanna Drucker's question, "Is there a digital art history?"
We focus our analysis on two main aspects that seem to suggest a coming paradigm shift towards a "digital" art history.
arXiv Detail & Related papers (2023-08-14T21:21:03Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z) - Pathway to Future Symbiotic Creativity [76.20798455931603]
We propose a classification of the creative system with a hierarchy of 5 classes, showing the pathway of creativity evolving from a mimic-human artist to a Machine artist in its own right.
In art creation, it is necessary for machines to understand humans' mental states, including desires, appreciation, and emotions, humans also need to understand machines' creative capabilities and limitations.
We propose a novel framework for building future Machine artists, which comes with the philosophy that a human-compatible AI system should be based on the "human-in-the-loop" principle.
arXiv Detail & Related papers (2022-08-18T15:12:02Z) - A Framework and Dataset for Abstract Art Generation via CalligraphyGAN [0.0]
We present a creative framework based on Conditional Generative Adversarial Networks and Contextual Neural Language Model to generate abstract artworks.
Our work is inspired by Chinese calligraphy, which is a unique form of visual art where the character itself is an aesthetic painting.
arXiv Detail & Related papers (2020-12-02T16:24:20Z) - State of the Art on Neural Rendering [141.22760314536438]
We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs.
This report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence.
arXiv Detail & Related papers (2020-04-08T04:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.