Related papers: AdaptaGen: Domain-Specific Image Generation through Hierarchical Semantic Optimization Framework

AdaptaGen: Domain-Specific Image Generation through Hierarchical Semantic Optimization Framework

URL: http://arxiv.org/abs/2507.05621v1
Date: Tue, 08 Jul 2025 03:04:08 GMT
Title: AdaptaGen: Domain-Specific Image Generation through Hierarchical Semantic Optimization Framework
Authors: Suoxiang Zhang, Xiaxi Li, Hongrui Chang, Zhuoyan Hou, Guoxin Wu, Ronghua Ji,
Abstract summary: Domain-specific image generation aims to produce high-quality visual content for specialized fields.<n>Current approaches overlook the inherent dependence between semantic understanding and visual representation in specialized domains.<n>We propose AdaptaGen, a hierarchical semantic optimization framework that integrates matrix-based prompt optimization with multi-perspective understanding.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Domain-specific image generation aims to produce high-quality visual content for specialized fields while ensuring semantic accuracy and detail fidelity. However, existing methods exhibit two critical limitations: First, current approaches address prompt engineering and model adaptation separately, overlooking the inherent dependence between semantic understanding and visual representation in specialized domains. Second, these techniques inadequately incorporate domain-specific semantic constraints during content synthesis, resulting in generation outcomes that exhibit hallucinations and semantic deviations. To tackle these issues, we propose AdaptaGen, a hierarchical semantic optimization framework that integrates matrix-based prompt optimization with multi-perspective understanding, capturing comprehensive semantic relationships from both global and local perspectives. To mitigate hallucinations in specialized domains, we design a cross-modal adaptation mechanism, which, when combined with intelligent content synthesis, enables preserving core thematic elements while incorporating diverse details across images. Additionally, we introduce a two-phase caption semantic transformation during the generation phase. This approach maintains semantic coherence while enhancing visual diversity, ensuring the generated images adhere to domain-specific constraints. Experimental results confirm our approach's effectiveness, with our framework achieving superior performance across 40 categories from diverse datasets using only 16 images per category, demonstrating significant improvements in image quality, diversity, and semantic consistency.

Related papers

Remote Sensing Large Vision-Language Model: Semantic-augmented Multi-level Alignment and Semantic-aware Expert Modeling [42.46176089721314]
Large Vision and Language Models (LVLMs) have shown strong performance across various vision-language tasks in natural image domains.<n>Their application to remote sensing (RS) remains underexplored due to significant domain differences in visual appearances, object scales, and semantics.<n>We propose a novel LVLM framework tailored for RS understanding, incorporating two core components: Semantic-augmented Multi-level Alignment and Semantic-aware Expert Modeling.
arXiv Detail & Related papers (2025-06-27T02:31:37Z)
Language Guided Domain Generalized Medical Image Segmentation [68.93124785575739]
Single source domain generalization holds promise for more reliable and consistent image segmentation across real-world clinical settings. We propose an approach that explicitly leverages textual information by incorporating a contrastive learning mechanism guided by the text encoder features. Our approach achieves favorable performance against existing methods in literature.
arXiv Detail & Related papers (2024-04-01T17:48:15Z)
Controllable Multi-domain Semantic Artwork Synthesis [17.536225601718687]
We propose a dataset that contains 40,000 images of artwork from 4 different domains with their corresponding semantic label maps. We generate the dataset by first extracting semantic maps from landscape photography. We then propose a conditional Generative Adrial Network (GAN)-based approach to generate high-quality artwork.
arXiv Detail & Related papers (2023-08-19T21:16:28Z)
Learning to Model Multimodal Semantic Alignment for Story Visualization [58.16484259508973]
Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story. Current works face the problem of semantic misalignment because of their fixed architecture and diversity of input modalities. We explore the semantic alignment between text and image representations by learning to match their semantic levels in the GAN-based generative model.
arXiv Detail & Related papers (2022-11-14T11:41:44Z)
High-Quality Entity Segmentation [110.55724145851725]
CropFormer is designed to tackle the intractability of instance-level segmentation on high-resolution images. It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image. With CropFormer, we achieve a significant AP gain of $1.9$ on the challenging entity segmentation task.
arXiv Detail & Related papers (2022-11-10T18:58:22Z)
Marginal Contrastive Correspondence for Guided Image Generation [58.0605433671196]
Exemplar-based image translation establishes dense correspondences between a conditional input and an exemplar from two different domains. Existing work builds the cross-domain correspondences implicitly by minimizing feature-wise distances across the two domains. We design a Marginal Contrastive Learning Network (MCL-Net) that explores contrastive learning to learn domain-invariant features for realistic exemplar-based image translation.
arXiv Detail & Related papers (2022-04-01T13:55:44Z)
A Unified Architecture of Semantic Segmentation and Hierarchical Generative Adversarial Networks for Expression Manipulation [52.911307452212256]
We develop a unified architecture of semantic segmentation and hierarchical GANs. A unique advantage of our framework is that on forward pass the semantic segmentation network conditions the generative model. We evaluate our method on two challenging facial expression translation benchmarks, AffectNet and RaFD, and a semantic segmentation benchmark, CelebAMask-HQ.
arXiv Detail & Related papers (2021-12-08T22:06:31Z)
Phase Consistent Ecological Domain Adaptation [76.75730500201536]
We focus on the task of semantic segmentation, where annotated synthetic data are aplenty, but annotating real data is laborious. The first criterion, inspired by visual psychophysics, is that the map between the two image domains be phase-preserving. The second criterion aims to leverage ecological statistics, or regularities in the scene which are manifest in any image of it, regardless of the characteristics of the illuminant or the imaging sensor.
arXiv Detail & Related papers (2020-04-10T06:58:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.