Related papers: Diverse Multimedia Layout Generation with Multi Choice Learning

Diverse Multimedia Layout Generation with Multi Choice Learning

URL: http://arxiv.org/abs/2301.06629v1
Date: Mon, 16 Jan 2023 22:53:55 GMT
Title: Diverse Multimedia Layout Generation with Multi Choice Learning
Authors: David D. Nguyen, Surya Nepal, Salil S. Kanhere
Abstract summary: In contrast to standard prediction tasks, there are a range of acceptable layouts which depend on user preferences. Existing machine learning models treat layouts as a single choice prediction problem. We present an auto-regressive neural network architecture, called LayoutMCL, that uses multi-choice prediction and winner-takes-all loss.
Score: 27.542940346258916
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Designing visually appealing layouts for multimedia documents containing text, graphs and images requires a form of creative intelligence. Modelling the generation of layouts has recently gained attention due to its importance in aesthetics and communication style. In contrast to standard prediction tasks, there are a range of acceptable layouts which depend on user preferences. For example, a poster designer may prefer logos on the top-left while another prefers logos on the bottom-right. Both are correct choices yet existing machine learning models treat layouts as a single choice prediction problem. In such situations, these models would simply average over all possible choices given the same input forming a degenerate sample. In the above example, this would form an unacceptable layout with a logo in the centre. In this paper, we present an auto-regressive neural network architecture, called LayoutMCL, that uses multi-choice prediction and winner-takes-all loss to effectively stabilise layout generation. LayoutMCL avoids the averaging problem by using multiple predictors to learn a range of possible options for each layout object. This enables LayoutMCL to generate multiple and diverse layouts from a single input which is in contrast with existing approaches which yield similar layouts with minor variations. Through quantitative benchmarks on real data (magazine, document and mobile app layouts), we demonstrate that LayoutMCL reduces Fr\'echet Inception Distance (FID) by 83-98% and generates significantly more diversity in comparison to existing approaches.

Related papers

AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models [15.483561230992768]
Aesthetic-Aware Preference Alignment(AAPA) is a novel technique to train a Multi-modal Large Language Model (MLLM) for layout interfaces. We propose a data filtering protocol utilizing our layout-quality prediction protocol to ensure training happens on high-quality layouts. We demonstrate the efficacy of our approach on two challenging benchmarks - Crello and Webui, showcasing 17%, and 16 improvement over current State-of-The-Art methods.
arXiv Detail & Related papers (2025-03-01T19:05:02Z)
CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation [78.21134311493303]
Diffusion models have been recognized for their ability to generate images that are not only visually appealing but also of high artistic quality. Previous methods primarily focus on UNet-based models (e.g., SD1.5 and SDXL), and limited effort has explored Multimodal Diffusion Transformers (MM-DiTs) Inherit the advantages of MM-DiT, we use a separate set network weights to process the image and text modalities. We contribute a large-scale layout dataset, named LayoutSAM, which includes 2.7 million image-text pairs and 10.7 million entities.
arXiv Detail & Related papers (2024-12-05T04:09:47Z)
GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a VLM-based framework that generates content-aware text logo layouts. We introduce two model techniques to reduce the computation for processing multiple glyph images simultaneously. To support instruction-tuning of out model, we construct two extensive text logo datasets, which are 5x more larger than the existing public dataset.
arXiv Detail & Related papers (2024-11-18T10:04:10Z)
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation. Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts. We conduct extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks.
arXiv Detail & Related papers (2024-06-05T03:05:52Z)
LayoutFlow: Flow Matching for Layout Generation [23.045325684880957]
We propose an efficient flow-based model capable of generating high-quality layouts. Our method learns to gradually move, or flow, the elements of an initial sample until it reaches its final prediction.
arXiv Detail & Related papers (2024-03-27T01:40:21Z)
PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout [62.12447593298437]
Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements. We propose design sequence formation (DSF) that reorganizes elements in layouts to imitate the design processes of human designers. A novel CNN-LSTM-based conditional generative adversarial network (GAN) is presented to generate proper layouts.
arXiv Detail & Related papers (2023-03-28T12:48:36Z)
LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models [50.73105631853759]
We present a novel generative model named LayoutDiffusion for automatic layout generation. It learns to reverse a mild forward process, in which layouts become increasingly chaotic with the growth of forward steps. It enables two conditional layout generation tasks in a plug-and-play manner without re-training and achieves better performance than existing methods.
arXiv Detail & Related papers (2023-03-21T04:41:02Z)
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production. Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z)
BLT: Bidirectional Layout Transformer for Controllable Layout Generation [27.239276265955954]
We introduce BLT, a bidirectional layout transformer for conditional layout generation. We verify the proposed model on multiple benchmarks with various fidelity metrics. Our results demonstrate two key advances to the state-of-the-art layout transformer models.
arXiv Detail & Related papers (2021-12-09T18:49:28Z)
Constrained Graphic Layout Generation via Latent Optimization [17.05026043385661]
We generate graphic layouts that can flexibly incorporate design semantics, either specified implicitly or explicitly by a user. Our approach builds on a generative layout model based on a Transformer architecture, and formulates the layout generation as a constrained optimization problem. We show in the experiments that our approach is capable of generating realistic layouts in both constrained and unconstrained generation tasks with a single model.
arXiv Detail & Related papers (2021-08-02T13:04:11Z)
LayoutTransformer: Layout Generation and Completion with Self-attention [105.21138914859804]
We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents, and 3D objects. We propose LayoutTransformer, a novel framework that leverages self-attention to learn contextual relationships between layout elements. Our framework allows us to generate a new layout either from an empty set or from an initial seed set of primitives, and can easily scale to support an arbitrary of primitives per layout.
arXiv Detail & Related papers (2020-06-25T17:56:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.