Related papers: GalleryGPT: Analyzing Paintings with Large Multimodal Models

GalleryGPT: Analyzing Paintings with Large Multimodal Models

URL: http://arxiv.org/abs/2408.00491v1
Date: Thu, 1 Aug 2024 11:52:56 GMT
Title: GalleryGPT: Analyzing Paintings with Large Multimodal Models
Authors: Yi Bin, Wenhao Shi, Yujuan Ding, Zhiqiang Hu, Zheng Wang, Yang Yang, See-Kiong Ng, Heng Tao Shen,
Abstract summary: Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability. Previous works for automatically analyzing artworks mainly focus on classification, retrieval, and other simple tasks, which is far from the goal of AI. We introduce a superior large multimodal model for painting analysis composing, dubbed GalleryGPT, which is slightly modified and fine-tuned based on LLaVA architecture.
Score: 64.98398357569765
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability. Understanding artworks is challenging due to its subjective nature, diverse interpretations, and complex visual elements, requiring expertise in art history, cultural background, and aesthetic theory. However, limited by the data collection and model ability, previous works for automatically analyzing artworks mainly focus on classification, retrieval, and other simple tasks, which is far from the goal of AI. To facilitate the research progress, in this paper, we step further to compose comprehensive analysis inspired by the remarkable perception and generation ability of large multimodal models. Specifically, we first propose a task of composing paragraph analysis for artworks, i.e., painting in this paper, only focusing on visual characteristics to formulate more comprehensive understanding of artworks. To support the research on formal analysis, we collect a large dataset PaintingForm, with about 19k painting images and 50k analysis paragraphs. We further introduce a superior large multimodal model for painting analysis composing, dubbed GalleryGPT, which is slightly modified and fine-tuned based on LLaVA architecture leveraging our collected data. We conduct formal analysis generation and zero-shot experiments across several datasets to assess the capacity of our model. The results show remarkable performance improvements comparing with powerful baseline LMMs, demonstrating its superb ability of art analysis and generalization. \textcolor{blue}{The codes and model are available at: https://github.com/steven640pixel/GalleryGPT.

Related papers

A Critical Assessment of Modern Generative Models' Ability to Replicate Artistic Styles [0.0]
This paper presents a critical assessment of the style replication capabilities of contemporary generative models. We examine how effectively these models reproduce traditional artistic styles while maintaining structural integrity and compositional balance. The analysis is based on a new large dataset of AI-generated works imitating artistic styles of the past.
arXiv Detail & Related papers (2025-02-21T07:00:06Z)
CognArtive: Large Language Models for Automating Art Analysis and Decoding Aesthetic Elements [1.0579965347526206]
Art, as a universal language, can be interpreted in diverse ways. Large Language Models (LLMs) and the availability of Multimodal Large Language Models (MLLMs) raise the question of how these models can be used to assess and interpret artworks.
arXiv Detail & Related papers (2025-02-04T18:08:23Z)
APDDv2: Aesthetics of Paintings and Drawings Dataset with Artist Labeled Scores and Comments [45.57709215036539]
We introduce the Aesthetics Paintings and Drawings dataset (APDD), the first comprehensive collection of paintings encompassing 24 distinct artistic categories and 10 aesthetic attributes. APDDv2 boasts an expanded image corpus and improved annotation quality, featuring detailed language comments. We present an updated version of the Art Assessment Network for Specific Painting Styles, denoted as ArtCLIP. Experimental validation demonstrates the superior performance of this revised model in the realm of aesthetic evaluation, surpassing its predecessor in accuracy and efficacy.
arXiv Detail & Related papers (2024-11-13T11:46:42Z)
KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph [24.586916324061168]
We present KALE Knowledge-Augmented vision-Language model for artwork Elaborations. KALE incorporates the metadata in two ways: firstly as direct textual input, and secondly through a multimodal heterogeneous knowledge graph. Experimental results demonstrate that KALE achieves strong performance over existing state-of-the-art work across several artwork datasets.
arXiv Detail & Related papers (2024-09-17T06:39:18Z)
MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
We present a comprehensive dataset compiled from Nature Communications articles covering 72 scientific fields. We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation. Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations.
arXiv Detail & Related papers (2024-07-06T00:40:53Z)
GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception Tasks? [51.22096780511165]
We present a new learning paradigm in which the knowledge extracted from large pre-trained models are utilized to help models like CNN and ViT learn enhanced representations. We feed detailed descriptions into a pre-trained encoder to extract text embeddings with rich semantic information that encodes the content of images.
arXiv Detail & Related papers (2023-06-01T14:02:45Z)
Synergy of Machine and Deep Learning Models for Multi-Painter Recognition [0.0]
We introduce a new large dataset for painting recognition task including 62 artists achieving good results. RegNet performs better in exporting features, while SVM makes the best classification of images based on the painter with a performance of up to 85%.
arXiv Detail & Related papers (2023-04-28T11:34:53Z)
Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method [64.40494830113286]
We first introduce a large-scale AIAA dataset: Boldbrush Artistic Image dataset (BAID), which consists of 60,337 artistic images covering various art forms. We then propose a new method, SAAN, which can effectively extract and utilize style-specific and generic aesthetic information to evaluate artistic images. Experiments demonstrate that our proposed approach outperforms existing IAA methods on the proposed BAID dataset.
arXiv Detail & Related papers (2023-03-27T12:59:15Z)
Holistic Visual-Textual Sentiment Analysis with Prior Models [64.48229009396186]
We propose a holistic method that achieves robust visual-textual sentiment analysis. The proposed method consists of four parts: (1) a visual-textual branch to learn features directly from data for sentiment analysis, (2) a visual expert branch with a set of pre-trained "expert" encoders to extract selected semantic visual features, (3) a CLIP branch to implicitly model visual-textual correspondence, and (4) a multimodal feature fusion network based on BERT to fuse multimodal features and make sentiment predictions.
arXiv Detail & Related papers (2022-11-23T14:40:51Z)
Automatic Image Content Extraction: Operationalizing Machine Learning in Humanistic Photographic Studies of Large Visual Archives [81.88384269259706]
We introduce Automatic Image Content Extraction framework for machine learning-based search and analysis of large image archives. The proposed framework can be applied in several domains in humanities and social sciences.
arXiv Detail & Related papers (2022-04-05T12:19:24Z)
Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings [14.89186519385364]
ArtSAGENet is a novel architecture that integrates Graph Neural Networks (GNNs) and Convolutional Neural Networks (CNNs) We show that our proposed ArtSAGENet captures and encodes valuable dependencies between the artists and the artworks. Our findings underline a great potential of integrating visual content and semantics for fine art analysis and curation.
arXiv Detail & Related papers (2021-05-17T23:05:36Z)
Learning Portrait Style Representations [34.59633886057044]
We study style representations learned by neural network architectures incorporating higher level characteristics. We find variation in learned style features from incorporating triplets annotated by art historians as supervision for style similarity. We also present the first large-scale dataset of portraits prepared for computational analysis.
arXiv Detail & Related papers (2020-12-08T01:36:45Z)
Understanding Compositional Structures in Art Historical Images using Pose and Gaze Priors [20.98603643788824]
Image compositions are useful in analyzing the interactions in an image to study artists and their artworks. In this work, we attempt to automate this process using the existing state of the art machine learning techniques. Our approach focuses on two central themes of image composition: (a) detection of action regions and action lines of the artwork; and (b) pose-based segmentation of foreground and background.
arXiv Detail & Related papers (2020-09-08T15:01:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.