Related papers: ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis

ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis

URL: http://arxiv.org/abs/2304.06297v1
Date: Thu, 13 Apr 2023 07:07:01 GMT
Title: ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis
Authors: Hongchen Tan, Baocai Yin, Kun Wei, Xiuping Liu, Xin Li
Abstract summary: We propose a novel Text-to-Image Generation Network, Adaptive Layout Refinement Generative Adversarial Network (ALR-GAN) The ALR-GAN includes an Adaptive Layout Refinement (ALR) module and a Layout Visual Refinement (LVR) loss. Experimental results on two widely-used datasets show that ALR-GAN performs competitively at the Text-to-Image generation task.
Score: 42.86424135174045
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a novel Text-to-Image Generation Network, Adaptive Layout Refinement Generative Adversarial Network (ALR-GAN), to adaptively refine the layout of synthesized images without any auxiliary information. The ALR-GAN includes an Adaptive Layout Refinement (ALR) module and a Layout Visual Refinement (LVR) loss. The ALR module aligns the layout structure (which refers to locations of objects and background) of a synthesized image with that of its corresponding real image. In ALR module, we proposed an Adaptive Layout Refinement (ALR) loss to balance the matching of hard and easy features, for more efficient layout structure matching. Based on the refined layout structure, the LVR loss further refines the visual representation within the layout area. Experimental results on two widely-used datasets show that ALR-GAN performs competitively at the Text-to-Image generation task.

Related papers

FoR-SALE: Frame of Reference-guided Spatial Adjustment in LLM-based Diffusion Editing [30.060342890828043]
In text-to-image (T2I) generation, even state-of-the-art models exhibit a significant performance gap when spatial descriptions are provided from perspectives other than the camera.<n>Our framework improves the performance of state-of-the-art T2I models by up to 5.3% using only a single round of correction.
arXiv Detail & Related papers (2025-09-27T18:42:04Z)
Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking [58.238858463243396]
We present Structured Masking for AR-based Layout-to-Image (SMARLI)<n>SMARLI integrates spatial layout constraints into AR-based image generation.<n>It achieves superior layoutaware control while maintaining the structural simplicity and generation efficiency of AR models.
arXiv Detail & Related papers (2025-09-15T15:27:29Z)
Self-supervised Photographic Image Layout Representation Learning [5.009120058742792]
We develop an autoencoder-based network architecture skilled in compressing heterogeneous layout graphs into precise, dimensionally-reduced layout representations. We introduce the LODB dataset, which features a broader range of layout categories and richer semantics. Our extensive experimentation on this dataset demonstrates the superior performance of our approach in the realm of photographic image layout representation learning.
arXiv Detail & Related papers (2024-03-06T14:28:53Z)
Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval [92.13664084464514]
The task of composed image retrieval (CIR) aims to retrieve images based on the query image and the text describing the users' intent. Existing methods have made great progress with the advanced large vision-language (VL) model in CIR task, however, they generally suffer from two main issues: lack of labeled triplets for model training and difficulty of deployment on resource-restricted environments. We propose Image2Sentence based Asymmetric zero-shot composed image retrieval (ISA), which takes advantage of the VL model and only relies on unlabeled images for composition learning.
arXiv Detail & Related papers (2024-03-03T07:58:03Z)
Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries. We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework. We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z)
LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts [107.11267074981905]
We propose a semantically controllable layout-AWare diffusion model, termed LAW-Diffusion. We show that LAW-Diffusion yields the state-of-the-art generative performance, especially with coherent object relations.
arXiv Detail & Related papers (2023-08-13T08:06:18Z)
Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network [52.77569396659629]
This paper presents the deep compensation network unfolding (DCUNet) for restoring light field (LF) images captured under low-light conditions. The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result. To properly leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module.
arXiv Detail & Related papers (2023-08-10T07:53:06Z)
Spectral Normalization and Dual Contrastive Regularization for Image-to-Image Translation [9.029227024451506]
We propose a new unpaired I2I translation framework based on dual contrastive regularization and spectral normalization. We conduct comprehensive experiments to evaluate the effectiveness of SN-DCR, and the results prove that our method achieves SOTA in multiple tasks.
arXiv Detail & Related papers (2023-04-22T05:22:24Z)
ReGANIE: Rectifying GAN Inversion Errors for Accurate Real Image Editing [20.39792009151017]
StyleGAN allows for flexible and plausible editing of generated images by manipulating the semantic-rich latent style space. Projecting a real image into its latent space encounters an inherent trade-off between inversion quality and editability. We propose a novel two-phase framework by designating two separate networks to tackle editing and reconstruction respectively.
arXiv Detail & Related papers (2023-01-31T04:38:42Z)
Geometry Aligned Variational Transformer for Image-conditioned Layout Generation [38.747175229902396]
We propose an Image-Conditioned Variational Transformer (ICVT) that autoregressively generates various layouts in an image. First, self-attention mechanism is adopted to model the contextual relationship within layout elements, while cross-attention mechanism is used to fuse the visual information of conditional images. We construct a large-scale advertisement poster layout designing dataset with delicate layout and saliency map annotations.
arXiv Detail & Related papers (2022-09-02T07:19:12Z)
Robust Reference-based Super-Resolution via C2-Matching [77.51610726936657]
Super-Resolution (Ref-SR) has recently emerged as a promising paradigm to enhance a low-resolution (LR) input image by introducing an additional high-resolution (HR) reference image. Existing Ref-SR methods mostly rely on implicit correspondence matching to borrow HR textures from reference images to compensate for the information loss in input images. We propose C2-Matching, which produces explicit robust matching crossing transformation and resolution.
arXiv Detail & Related papers (2021-06-03T16:40:36Z)
Deep Selective Combinatorial Embedding and Consistency Regularization for Light Field Super-resolution [93.95828097088608]
Light field (LF) images acquired by hand-held devices usually suffer from low spatial resolution. The high-dimensionality characteristic and complex geometrical structure of LF images make the problem more challenging than traditional single-image SR. We propose a novel learning-based LF spatial SR framework to explore the coherence among LF sub-aperture images. Experimental results over both synthetic and real-world LF datasets demonstrate the significant advantage of our approach over state-of-the-art methods.
arXiv Detail & Related papers (2020-09-26T08:34:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.