Related papers: Aesthetic Language Guidance Generation of Images Using Attribute Comparison

Aesthetic Language Guidance Generation of Images Using Attribute Comparison

URL: http://arxiv.org/abs/2208.04740v1
Date: Tue, 9 Aug 2022 12:35:23 GMT
Title: Aesthetic Language Guidance Generation of Images Using Attribute Comparison
Authors: Xin Jin, Qiang Deng, Jianwen Lv, Heng Huang, Hao Lou, Chaoen Xiao
Abstract summary: The improvement of intelligent equipments and algorithms cannot replace human subjective photography technology. We divide aesthetic language guidance of image (ALG) into ALG-T and ALG-I. Both ALG-T and ALG-I conduct aesthetic language guidance respectively for the two types of input images.
Score: 68.01313297926109
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the vigorous development of mobile photography technology, major mobile phone manufacturers are scrambling to improve the shooting ability of equipments and the photo beautification algorithm of software. However, the improvement of intelligent equipments and algorithms cannot replace human subjective photography technology. In this paper, we propose the aesthetic language guidance of image (ALG). We divide ALG into ALG-T and ALG-I according to whether the guiding rules are based on photography templates or guidance images. Whether it is ALG-T or ALG-I, we guide photography from three attributes of color, lighting and composition of the images. The differences of the three attributes between the input images and the photography templates or the guidance images are described in natural language, which is aesthetic natural language guidance (ALG). Also, because of the differences in lighting and composition between landscape images and portrait images, we divide the input images into landscape images and portrait images. Both ALG-T and ALG-I conduct aesthetic language guidance respectively for the two types of input images (landscape images and portrait images).

Related papers

Personalized Image Filter: Mastering Your Photographic Style [57.83973633106558]
generative prior enables PIF to learn the average appearance of photographic concepts.<n>PIF shows outstanding performance in extracting and transferring various kinds of photographic style.
arXiv Detail & Related papers (2025-10-19T11:03:21Z)
The Photographer Eye: Teaching Multimodal Large Language Models to Understand Image Aesthetics like Photographers [82.99499130882576]
Photographer and curator, Szarkowski insightfully revealed one of the notable gaps between general and aesthetic visual understanding.<n>We present a novel dataset, PhotoCritique, derived from extensive discussions among professional photographers and enthusiasts.<n>We also propose a novel model, PhotoEye, featuring a languageguided multi-view vision fusion mechanism to understand image aesthetics from multiple perspectives.
arXiv Detail & Related papers (2025-09-23T02:59:41Z)
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework [1.5998912722142729]
Let Androids Dream (LAD) is a novel framework for image implication understanding and reasoning.<n>Our framework with the lightweight GPT-4o-mini model achieves SOTA performance compared to 15+ MLLMs on English image implication benchmark.<n>Our work provides new insights into how AI can more effectively interpret image implications.
arXiv Detail & Related papers (2025-05-22T17:59:53Z)
LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair [116.48684498656871]
We propose the LoRA of Change (LoC) framework for image editing with visual instructions, i.e., before-after image pairs. We learn an instruction-specific LoRA to encode the "change" in a before-after image pair, enhancing the interpretability and reusability of our model. Our model produces high-quality images that align with user intent and support a broad spectrum of real-world visual instructions.
arXiv Detail & Related papers (2024-11-28T13:55:06Z)
Learning AND-OR Templates for Professional Photograph Parsing and Guidance [5.906114868515906]
We learn a hierarchical reconfigurable image template from photography images to learn and characterize the "templates" used in these photography images. Experimental results show that the learned templates can well describe the photography techniques and styles, whereas the proposed approach can assess the quality of photography images as human being does.
arXiv Detail & Related papers (2024-10-08T15:27:19Z)
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions [66.92809850624118]
PixWizard is an image-to-image visual assistant designed for image generation, manipulation, and translation based on free-from language instructions. We tackle a variety of vision tasks into a unified image-text-to-image generation framework and curate an Omni Pixel-to-Pixel Instruction-Tuning dataset. Our experiments demonstrate that PixWizard not only shows impressive generative and understanding abilities for images with diverse resolutions but also exhibits promising generalization capabilities with unseen tasks and human instructions.
arXiv Detail & Related papers (2024-09-23T17:59:46Z)
PhotoBot: Reference-Guided Interactive Photography via Natural Language [15.486784377142314]
PhotoBot is a framework for fully automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. We leverage a visual language model (VLM) and an object manipulator to characterize the reference images. We also use a large language model (LLM) to retrieve relevant reference images based on a user's language query.
arXiv Detail & Related papers (2024-01-19T23:34:48Z)
Improving Generalization of Image Captioning with Unsupervised Prompt Learning [63.26197177542422]
Generalization of Image Captioning (GeneIC) learns a domain-specific prompt vector for the target domain without requiring annotated data. GeneIC aligns visual and language modalities with a pre-trained Contrastive Language-Image Pre-Training (CLIP) model.
arXiv Detail & Related papers (2023-08-05T12:27:01Z)
VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining [53.470662123170555]
We propose learning image aesthetics from user comments, and exploring vision-language pretraining methods to learn multimodal aesthetic representations. Specifically, we pretrain an image-text encoder-decoder model with image-comment pairs, using contrastive and generative objectives to learn rich and generic aesthetic semantics without human labels. Our results show that our pretrained aesthetic vision-language model outperforms prior works on image aesthetic captioning over the AVA-Captions dataset.
arXiv Detail & Related papers (2023-03-24T23:57:28Z)
Designing An Illumination-Aware Network for Deep Image Relighting [69.750906769976]
We present an Illumination-Aware Network (IAN) which follows the guidance from hierarchical sampling to progressively relight a scene from a single image. In addition, an Illumination-Aware Residual Block (IARB) is designed to approximate the physical rendering process. Experimental results show that our proposed method produces better quantitative and qualitative relighting results than previous state-of-the-art methods.
arXiv Detail & Related papers (2022-07-21T16:21:24Z)
Deep Portrait Lighting Enhancement with 3D Guidance [24.01582513386902]
We present a novel deep learning framework for portrait lighting enhancement based on 3D facial guidance. Experimental results on the FFHQ dataset and in-the-wild images show that the proposed method outperforms state-of-the-art methods in terms of both quantitative metrics and visual quality.
arXiv Detail & Related papers (2021-08-04T15:49:09Z)
Recapture as You Want [140.6691726604726]
We present a portrait recapture method enabling users to easily edit their portrait to desired posture/view, body figure and clothing style. We decompose the editing procedure into semantic-aware geometric and appearance transformation. In appearance transformation, we design two novel modules, Semantic-aware Attentive Transfer (SAT) and Layout Graph Reasoning (LGR)
arXiv Detail & Related papers (2020-06-02T07:43:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.