From Pampas to Pixels: Fine-Tuning Diffusion Models for Ga\'ucho
Heritage
- URL: http://arxiv.org/abs/2401.05520v1
- Date: Wed, 10 Jan 2024 19:34:52 GMT
- Title: From Pampas to Pixels: Fine-Tuning Diffusion Models for Ga\'ucho
Heritage
- Authors: Marcellus Amadeus, William Alberto Cruz Casta\~neda, Andr\'e Felipe
Zanella, Felipe Rodrigues Perche Mahlow
- Abstract summary: This paper addresses the potential of Latent Diffusion Models (LDMs) in representing local cultural concepts, historical figures, and endangered species.
Our objective is to contribute to the broader understanding of how generative models can help to capture and preserve the cultural and historical identity of regions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative AI has become pervasive in society, witnessing significant
advancements in various domains. Particularly in the realm of Text-to-Image
(TTI) models, Latent Diffusion Models (LDMs), showcase remarkable capabilities
in generating visual content based on textual prompts. This paper addresses the
potential of LDMs in representing local cultural concepts, historical figures,
and endangered species. In this study, we use the cultural heritage of Rio
Grande do Sul (RS), Brazil, as an illustrative case. Our objective is to
contribute to the broader understanding of how generative models can help to
capture and preserve the cultural and historical identity of regions. The paper
outlines the methodology, including subject selection, dataset creation, and
the fine-tuning process. The results showcase the images generated, alongside
the challenges and feasibility of each concept. In conclusion, this work shows
the power of these models to represent and preserve unique aspects of diverse
regions and communities.
Related papers
- Illustrating Classic Brazilian Books using a Text-To-Image Diffusion Model [0.4374837991804086]
Latent Diffusion Models (LDMs) signifies a paradigm shift in the domain of AI capabilities.
This article delves into the feasibility of employing the Stable Diffusion LDM to illustrate literary works.
arXiv Detail & Related papers (2024-08-01T13:28:15Z) - GalleryGPT: Analyzing Paintings with Large Multimodal Models [64.98398357569765]
Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability.
Previous works for automatically analyzing artworks mainly focus on classification, retrieval, and other simple tasks, which is far from the goal of AI.
We introduce a superior large multimodal model for painting analysis composing, dubbed GalleryGPT, which is slightly modified and fine-tuned based on LLaVA architecture.
arXiv Detail & Related papers (2024-08-01T11:52:56Z) - Crossroads of Continents: Automated Artifact Extraction for Cultural Adaptation with Large Multimodal Models [22.92083941222383]
We introduce DalleStreet, a large-scale dataset generated by DALL-E 3 and validated by humans.
We find disparities in cultural understanding at geographic sub-region levels with both open-source (LLaVA) and closed-source (GPT-4V) models.
Our findings reveal a nuanced picture of the cultural competence of LMMs, highlighting the need to develop culture-aware systems.
arXiv Detail & Related papers (2024-07-02T08:55:41Z) - Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance [12.33170407159189]
State-of-the-art text-to-image generative models struggle to depict everyday objects with the true diversity of the real world.
We introduce an inference time intervention, contextualized Vendi Score Guidance (c-VSG), that guides the backwards steps of latent diffusion models to increase the diversity of a sample.
We find that c-VSG substantially increases the diversity of generated images, both for the worst performing regions and on average, while simultaneously maintaining or improving image quality and consistency.
arXiv Detail & Related papers (2024-06-06T23:35:51Z) - ORACLE: Leveraging Mutual Information for Consistent Character Generation with LoRAs in Diffusion Models [3.7599363231894185]
We introduce a novel framework designed to produce consistent character representations from a single text prompt.
Our framework outperforms existing methods in generating characters with consistent visual identities.
arXiv Detail & Related papers (2024-06-04T23:39:08Z) - Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling [49.41822427811098]
We present Kaleido, a novel approach that enhances the diversity of samples by incorporating autoregressive latent priors.
Kaleido integrates an autoregressive language model that encodes the original caption and generates latent variables.
We show that Kaleido adheres closely to the guidance provided by the generated latent variables, demonstrating its capability to effectively control and direct the image generation process.
arXiv Detail & Related papers (2024-05-31T17:41:11Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation [69.91401809979709]
Current state-of-the-art image generation models such as Latent Diffusion Models (LDMs) have demonstrated the capacity to produce visually striking food-related images.
We introduce FoodFusion, a Latent Diffusion model engineered specifically for the faithful synthesis of realistic food images from textual descriptions.
The development of the FoodFusion model involves harnessing an extensive array of open-source food datasets, resulting in over 300,000 curated image-caption pairs.
arXiv Detail & Related papers (2023-12-06T15:07:12Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - Diffusion Based Augmentation for Captioning and Retrieval in Cultural
Heritage [28.301944852273746]
This paper introduces a novel approach to address the challenges of limited annotated data and domain shifts in the cultural heritage domain.
By leveraging generative vision-language models, we augment art datasets by generating diverse variations of artworks conditioned on their captions.
arXiv Detail & Related papers (2023-08-14T13:59:04Z) - Inspecting the Geographical Representativeness of Images from
Text-to-Image Models [52.80961012689933]
We measure the geographical representativeness of generated images using a crowdsourced study comprising 540 participants across 27 countries.
For deliberately underspecified inputs without country names, the generated images most reflect the surroundings of the United States followed by India.
The overall scores for many countries still remain low, highlighting the need for future models to be more geographically inclusive.
arXiv Detail & Related papers (2023-05-18T16:08:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.