Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and
Multi-Source Supervision
- URL: http://arxiv.org/abs/2312.08056v1
- Date: Wed, 13 Dec 2023 11:03:07 GMT
- Title: Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and
Multi-Source Supervision
- Authors: Shengguang Wu, Zhenglun Chen, Qi Su
- Abstract summary: We propose a novel knowledge-aware artifact image synthesis approach that brings lost historical objects accurately into their visual forms.
Compared to existing approaches, our proposed model produces higher-quality artifact images that align better with the implicit details and historical knowledge contained within written documents.
- Score: 5.517240672957627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ancient artifacts are an important medium for cultural preservation and
restoration. However, many physical copies of artifacts are either damaged or
lost, leaving a blank space in archaeological and historical studies that calls
for artifact image generation techniques. Despite the significant advancements
in open-domain text-to-image synthesis, existing approaches fail to capture the
important domain knowledge presented in the textual description, resulting in
errors in recreated images such as incorrect shapes and patterns. In this
paper, we propose a novel knowledge-aware artifact image synthesis approach
that brings lost historical objects accurately into their visual forms. We use
a pretrained diffusion model as backbone and introduce three key techniques to
enhance the text-to-image generation framework: 1) we construct prompts with
explicit archaeological knowledge elicited from large language models (LLMs);
2) we incorporate additional textual guidance to correlated historical
expertise in a contrastive manner; 3) we introduce further visual-semantic
constraints on edge and perceptual features that enable our model to learn more
intricate visual details of the artifacts. Compared to existing approaches, our
proposed model produces higher-quality artifact images that align better with
the implicit details and historical knowledge contained within written
documents, thus achieving significant improvements across automatic metrics and
in human evaluation. Our code and data are available at
https://github.com/danielwusg/artifact_diffusion.
Related papers
- ArtiFade: Learning to Generate High-quality Subject from Blemished Images [10.112125529627157]
ArtiFade exploits fine-tuning of a pre-trained text-to-image model, aiming to remove artifacts.
ArtiFade also ensures the preservation of the original generative capabilities inherent within the diffusion model.
arXiv Detail & Related papers (2024-09-05T17:57:59Z) - ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models [52.23899502520261]
We introduce a new framework named ARTIST to focus on the learning of text structures.
We finetune a visual diffusion model, enabling it to assimilate textual structure information from the pretrained textual model.
Empirical results on the MARIO-Eval benchmark underscore the effectiveness of the proposed method, showing an improvement of up to 15% in various metrics.
arXiv Detail & Related papers (2024-06-17T19:31:24Z) - ENTED: Enhanced Neural Texture Extraction and Distribution for
Reference-based Blind Face Restoration [51.205673783866146]
We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images.
We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image.
The StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images.
arXiv Detail & Related papers (2024-01-13T04:54:59Z) - PHD: Pixel-Based Language Modeling of Historical Documents [55.75201940642297]
We propose a novel method for generating synthetic scans to resemble real historical documents.
We pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period.
We successfully apply our model to a historical QA task, highlighting its usefulness in this domain.
arXiv Detail & Related papers (2023-10-22T08:45:48Z) - Perceptual Artifacts Localization for Image Synthesis Tasks [59.638307505334076]
We introduce a novel dataset comprising 10,168 generated images, each annotated with per-pixel perceptual artifact labels.
A segmentation model, trained on our proposed dataset, effectively localizes artifacts across a range of tasks.
We propose an innovative zoom-in inpainting pipeline that seamlessly rectifies perceptual artifacts in the generated images.
arXiv Detail & Related papers (2023-10-09T10:22:08Z) - ScrollTimes: Tracing the Provenance of Paintings as a Window into
History [35.605930297790465]
The study of cultural artifact provenance, tracing ownership and preservation, holds significant importance in archaeology and art history.
In collaboration with art historians, we examined the handscroll, a traditional Chinese painting form that provides a rich source of historical data.
We present a three-tiered methodology encompassing artifact, contextual, and provenance levels, designed to create a "Biography" for handscroll.
arXiv Detail & Related papers (2023-06-15T03:38:09Z) - AGTGAN: Unpaired Image Translation for Photographic Ancient Character
Generation [27.77329906930072]
We propose an unsupervised generative adversarial network called AGTGAN.
By explicit global and local glyph shape style modeling, our method can generate characters with diverse glyphs and realistic textures.
With our generated images, experiments on the largest photographic oracle bone character dataset show that our method can achieve a significant increase in classification accuracy, up to 16.34%.
arXiv Detail & Related papers (2023-03-13T11:18:41Z) - ArcAid: Analysis of Archaeological Artifacts using Drawings [23.906975910478142]
Archaeology is an intriguing domain for computer vision.
It suffers not only from shortage in (labeled) data, but also from highly-challenging data, which is often extremely abraded and damaged.
This paper proposes a novel semi-supervised model for classification and retrieval of images of archaeological artifacts.
arXiv Detail & Related papers (2022-11-17T11:57:01Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z) - Re-Imagen: Retrieval-Augmented Text-to-Image Generator [58.60472701831404]
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
arXiv Detail & Related papers (2022-09-29T00:57:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.