Learning AND-OR Templates for Professional Photograph Parsing and Guidance
- URL: http://arxiv.org/abs/2410.06124v1
- Date: Tue, 8 Oct 2024 15:27:19 GMT
- Title: Learning AND-OR Templates for Professional Photograph Parsing and Guidance
- Authors: Xin Jin, Liaoruxing Zhang, Chenyu Fan, Wenbo Yuan,
- Abstract summary: We learn a hierarchical reconfigurable image template from photography images to learn and characterize the "templates" used in these photography images.
Experimental results show that the learned templates can well describe the photography techniques and styles, whereas the proposed approach can assess the quality of photography images as human being does.
- Score: 5.906114868515906
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Since the development of photography art, many so-called "templates" have been formed, namely visual styles summarized from a series of themed and stylized photography works. In this paper, we propose to analysize and and summarize these 'templates' in photography by learning composite templates of photography images. We present a framework for learning a hierarchical reconfigurable image template from photography images to learn and characterize the "templates" used in these photography images. Using this method, we measured the artistic quality of photography on the photos and conducted photography guidance. In addition, we also utilized the "templates" for guidance in several image generation tasks. Experimental results show that the learned templates can well describe the photography techniques and styles, whereas the proposed approach can assess the quality of photography images as human being does.
Related papers
- The Photographer Eye: Teaching Multimodal Large Language Models to Understand Image Aesthetics like Photographers [82.99499130882576]
Photographer and curator, Szarkowski insightfully revealed one of the notable gaps between general and aesthetic visual understanding.<n>We present a novel dataset, PhotoCritique, derived from extensive discussions among professional photographers and enthusiasts.<n>We also propose a novel model, PhotoEye, featuring a languageguided multi-view vision fusion mechanism to understand image aesthetics from multiple perspectives.
arXiv Detail & Related papers (2025-09-23T02:59:41Z) - ProCrop: Learning Aesthetic Image Cropping from Professional Compositions [57.949730056500634]
ProCrop is a retrieval-based method that leverages professional photography to guide cropping decisions.<n>We present a large-scale dataset of 242K weakly-annotated images, generated by out-painting professional images.<n>This composition-aware dataset generation offers diverse high-quality crop proposals guided by aesthetic principles.
arXiv Detail & Related papers (2025-05-28T15:38:44Z) - Surrealistic-like Image Generation with Vision-Language Models [4.66729174362509]
In this paper, we explore the generation of images in the style of paintings in the surrealism movement using vision-language generative models.
Our investigation starts with the generation of images under various image generation settings and different models.
We evaluate the performance of selected models and gain valuable insights into their capabilities in generating such images.
arXiv Detail & Related papers (2024-12-18T22:03:26Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Measuring Style Similarity in Diffusion Models [118.22433042873136]
We present a framework for understanding and extracting style descriptors from images.
Our framework comprises a new dataset curated using the insight that style is a subjective property of an image.
We also propose a method to extract style attribute descriptors that can be used to style of a generated image to the images used in the training dataset of a text-to-image model.
arXiv Detail & Related papers (2024-04-01T17:58:30Z) - An Image-based Typology for Visualization [23.716718517642878]
We present and discuss the results of a qualitative analysis of visual representations from images.
We derive a typology of 10 visualization types of defined groups.
We provide a dataset of 6,833 tagged images and an online tool that can be used to explore and analyze the large set of labeled images.
arXiv Detail & Related papers (2024-03-07T04:33:42Z) - Unsupervised Compositional Concepts Discovery with Text-to-Image
Generative Models [80.75258849913574]
In this paper, we consider the inverse problem -- given a collection of different images, can we discover the generative concepts that represent each image?
We present an unsupervised approach to discover generative concepts from a collection of images, disentangling different art styles in paintings, objects, and lighting from kitchen scenes, and discovering image classes given ImageNet images.
arXiv Detail & Related papers (2023-06-08T17:02:15Z) - Subject-driven Text-to-Image Generation via Apprenticeship Learning [83.88256453081607]
We present SuTI, a subject-driven Text-to-Image generator that replaces subject-specific fine tuning with in-context learning.
SuTI is powered by apprenticeship learning, where a single apprentice model is learned from data generated by a massive number of subject-specific expert models.
We show that SuTI significantly outperforms existing models like InstructPix2Pix, Textual Inversion, Imagic, Prompt2Prompt, Re-Imagen and DreamBooth.
arXiv Detail & Related papers (2023-04-01T00:47:35Z) - Aesthetic Language Guidance Generation of Images Using Attribute
Comparison [68.01313297926109]
The improvement of intelligent equipments and algorithms cannot replace human subjective photography technology.
We divide aesthetic language guidance of image (ALG) into ALG-T and ALG-I.
Both ALG-T and ALG-I conduct aesthetic language guidance respectively for the two types of input images.
arXiv Detail & Related papers (2022-08-09T12:35:23Z) - ICC++: Explainable Image Retrieval for Art Historical Corpora using
Image Composition Canvas [19.80532568090711]
We present a novel approach called Image Composition Canvas (ICC++) to compare and retrieve images having similar compositional elements.
ICC++ is an improvement over ICC specializing in generating low and high-level features (compositional elements) motivated by Max Imdahl's work.
arXiv Detail & Related papers (2022-06-22T14:06:29Z) - Photozilla: A Large-Scale Photography Dataset and Visual Embedding for
20 Photography Styles [0.6308539010172307]
We introduce a large-scale dataset termed 'Photozilla' that includes over 990k images belonging to 10 different photographic styles.
The dataset is then used to train 3 classification models to automatically classify the images into the relevant style.
We report an accuracy of over 68% for identifying 10 other distinct types of photography styles.
arXiv Detail & Related papers (2021-06-21T18:45:06Z) - Learning Portrait Style Representations [34.59633886057044]
We study style representations learned by neural network architectures incorporating higher level characteristics.
We find variation in learned style features from incorporating triplets annotated by art historians as supervision for style similarity.
We also present the first large-scale dataset of portraits prepared for computational analysis.
arXiv Detail & Related papers (2020-12-08T01:36:45Z) - SketchEmbedNet: Learning Novel Concepts by Imitating Drawings [125.45799722437478]
We explore properties of image representations learned by training a model to produce sketches of images.
We show that this generative, class-agnostic model produces informative embeddings of images from novel examples, classes, and even novel datasets in a few-shot setting.
arXiv Detail & Related papers (2020-08-27T16:43:28Z) - Multiple Generative Adversarial Networks Analysis for Predicting
Photographers' Retouching [0.0]
This study aims to explore the possibility to use deep learning methods and more specifically, generative adversarial networks (GANs) to mimic artists' retouching.
arXiv Detail & Related papers (2020-06-03T10:10:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.