PhotoArtAgent: Intelligent Photo Retouching with Language Model-Based Artist Agents
- URL: http://arxiv.org/abs/2505.23130v1
- Date: Thu, 29 May 2025 06:00:51 GMT
- Title: PhotoArtAgent: Intelligent Photo Retouching with Language Model-Based Artist Agents
- Authors: Haoyu Chen, Keda Tao, Yizao Wang, Xinlei Wang, Lei Zhu, Jinjin Gu,
- Abstract summary: PhotoArtAgent is an intelligent interpretative system that emulates the creative process of a professional artist.<n>PhotoArtAgent provides transparent, text-based explanations of its creative rationale, fostering meaningful interaction and user control.<n> Experimental results show that PhotoArtAgent not only surpasses existing automated tools in user studies but also achieves results comparable to those of professional human artists.
- Score: 28.44728600512551
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Photo retouching is integral to photographic art, extending far beyond simple technical fixes to heighten emotional expression and narrative depth. While artists leverage expertise to create unique visual effects through deliberate adjustments, non-professional users often rely on automated tools that produce visually pleasing results but lack interpretative depth and interactive transparency. In this paper, we introduce PhotoArtAgent, an intelligent system that combines Vision-Language Models (VLMs) with advanced natural language reasoning to emulate the creative process of a professional artist. The agent performs explicit artistic analysis, plans retouching strategies, and outputs precise parameters to Lightroom through an API. It then evaluates the resulting images and iteratively refines them until the desired artistic vision is achieved. Throughout this process, PhotoArtAgent provides transparent, text-based explanations of its creative rationale, fostering meaningful interaction and user control. Experimental results show that PhotoArtAgent not only surpasses existing automated tools in user studies but also achieves results comparable to those of professional human artists.
Related papers
- JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent [74.64342043677975]
Photo retouching has become integral to contemporary visual storytelling, enabling users to capture aesthetics and express creativity.<n>We introduce JarvisArt, a multi-modal language model (MLLM)-driven agent that understands user intent, mimics the reasoning process of professional artists, and intelligently coordinates over 200 retouching tools within Lightroom.<n>To evaluate performance, we develop MMArt-Bench, a novel benchmark constructed from real-world user edits.<n>JarvisArt outperforms GPT-4o with a 60% improvement in average pixel-level metrics on MMArt-Bench for content fidelity, while maintaining comparable instruction-following capabilities.
arXiv Detail & Related papers (2025-06-21T06:36:00Z) - ArtistAuditor: Auditing Artist Style Pirate in Text-to-Image Generation Models [61.55816738318699]
We propose a novel method for data-use auditing in the text-to-image generation model.<n>ArtistAuditor employs a style extractor to obtain the multi-granularity style representations and treats artworks as samplings of an artist's style.<n>The experimental results on six combinations of models and datasets show that ArtistAuditor can achieve high AUC values.
arXiv Detail & Related papers (2025-04-17T16:15:38Z) - Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists [1.5296069874080693]
We compare the artistic capabilities of artists and laypeople using generative AI.<n>On average, artists produced more faithful and creative outputs than their lay counterparts.<n>While AI may ease content creation, professional expertise is still valuable.
arXiv Detail & Related papers (2025-01-21T18:53:21Z) - Emergence of Painting Ability via Recognition-Driven Evolution [49.666177849272856]
We present a model with a stroke branch and a palette branch that together simulate human-like painting.<n>We quantify the efficiency of visual communication by measuring the recognition accuracy achieved with machine vision.<n> Experimental results show that our model achieves superior performance in high-level recognition tasks.
arXiv Detail & Related papers (2025-01-09T04:37:31Z) - Equivalence: An analysis of artists' roles with Image Generative AI from Conceptual Art perspective through an interactive installation design practice [16.063735487844628]
This study explores how artists interact with advanced text-to-image Generative AI models.
To exemplify this framework, a case study titled "Equivalence" converts users' speech input into continuously evolving paintings.
This work aims to broaden our understanding of artists' roles and foster a deeper appreciation for the creative aspects inherent in artwork created with Image Generative AI.
arXiv Detail & Related papers (2024-04-29T02:45:23Z) - Impressions: Understanding Visual Semiotics and Aesthetic Impact [66.40617566253404]
We present Impressions, a novel dataset through which to investigate the semiotics of images.
We show that existing multimodal image captioning and conditional generation models struggle to simulate plausible human responses to images.
This dataset significantly improves their ability to model impressions and aesthetic evaluations of images through fine-tuning and few-shot adaptation.
arXiv Detail & Related papers (2023-10-27T04:30:18Z) - Learning to Evaluate the Artness of AI-generated Images [64.48229009396186]
ArtScore is a metric designed to evaluate the degree to which an image resembles authentic artworks by artists.
We employ pre-trained models for photo and artwork generation, resulting in a series of mixed models.
This dataset is then employed to train a neural network that learns to estimate quantized artness levels of arbitrary images.
arXiv Detail & Related papers (2023-05-08T17:58:27Z) - RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards
Precise Expressions [9.51095076299351]
We develop RePrompt, an automatic method to refine text prompts toward precise expression of generated images.
Inspired by crowdsourced editing strategies, we curated intuitive text features, such as the number and concreteness of nouns.
With model explanations of the proxy model, we curated a rubric to adjust text prompts to optimize image generation for precise emotion expression.
arXiv Detail & Related papers (2023-02-19T03:31:31Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z) - Generative Art Using Neural Visual Grammars and Dual Encoders [25.100664361601112]
A novel algorithm for producing generative art is described.
It allows a user to input a text string, and which in a creative response to this string, outputs an image.
arXiv Detail & Related papers (2021-05-01T04:21:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.