Diverse Multimedia Layout Generation with Multi Choice Learning
- URL: http://arxiv.org/abs/2301.06629v1
- Date: Mon, 16 Jan 2023 22:53:55 GMT
- Title: Diverse Multimedia Layout Generation with Multi Choice Learning
- Authors: David D. Nguyen, Surya Nepal, Salil S. Kanhere
- Abstract summary: In contrast to standard prediction tasks, there are a range of acceptable layouts which depend on user preferences.
Existing machine learning models treat layouts as a single choice prediction problem.
We present an auto-regressive neural network architecture, called LayoutMCL, that uses multi-choice prediction and winner-takes-all loss.
- Score: 27.542940346258916
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Designing visually appealing layouts for multimedia documents containing
text, graphs and images requires a form of creative intelligence. Modelling the
generation of layouts has recently gained attention due to its importance in
aesthetics and communication style. In contrast to standard prediction tasks,
there are a range of acceptable layouts which depend on user preferences. For
example, a poster designer may prefer logos on the top-left while another
prefers logos on the bottom-right. Both are correct choices yet existing
machine learning models treat layouts as a single choice prediction problem. In
such situations, these models would simply average over all possible choices
given the same input forming a degenerate sample. In the above example, this
would form an unacceptable layout with a logo in the centre. In this paper, we
present an auto-regressive neural network architecture, called LayoutMCL, that
uses multi-choice prediction and winner-takes-all loss to effectively stabilise
layout generation. LayoutMCL avoids the averaging problem by using multiple
predictors to learn a range of possible options for each layout object. This
enables LayoutMCL to generate multiple and diverse layouts from a single input
which is in contrast with existing approaches which yield similar layouts with
minor variations. Through quantitative benchmarks on real data (magazine,
document and mobile app layouts), we demonstrate that LayoutMCL reduces
Fr\'echet Inception Distance (FID) by 83-98% and generates significantly more
diversity in comparison to existing approaches.
Related papers
- GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a VLM-based framework that generates content-aware text logo layouts.
We introduce two model techniques to reduce the computation for processing multiple glyph images simultaneously.
To support instruction-tuning of out model, we construct two extensive text logo datasets, which are 5x more larger than the existing public dataset.
arXiv Detail & Related papers (2024-11-18T10:04:10Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.
Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.
We conduct extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - LayoutFlow: Flow Matching for Layout Generation [23.045325684880957]
We propose an efficient flow-based model capable of generating high-quality layouts.
Our method learns to gradually move, or flow, the elements of an initial sample until it reaches its final prediction.
arXiv Detail & Related papers (2024-03-27T01:40:21Z) - PosterLayout: A New Benchmark and Approach for Content-aware
Visual-Textual Presentation Layout [62.12447593298437]
Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements.
We propose design sequence formation (DSF) that reorganizes elements in layouts to imitate the design processes of human designers.
A novel CNN-LSTM-based conditional generative adversarial network (GAN) is presented to generate proper layouts.
arXiv Detail & Related papers (2023-03-28T12:48:36Z) - LayoutDiffusion: Improving Graphic Layout Generation by Discrete
Diffusion Probabilistic Models [50.73105631853759]
We present a novel generative model named LayoutDiffusion for automatic layout generation.
It learns to reverse a mild forward process, in which layouts become increasingly chaotic with the growth of forward steps.
It enables two conditional layout generation tasks in a plug-and-play manner without re-training and achieves better performance than existing methods.
arXiv Detail & Related papers (2023-03-21T04:41:02Z) - LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication.
Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production.
Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z) - BLT: Bidirectional Layout Transformer for Controllable Layout Generation [27.239276265955954]
We introduce BLT, a bidirectional layout transformer for conditional layout generation.
We verify the proposed model on multiple benchmarks with various fidelity metrics.
Our results demonstrate two key advances to the state-of-the-art layout transformer models.
arXiv Detail & Related papers (2021-12-09T18:49:28Z) - Constrained Graphic Layout Generation via Latent Optimization [17.05026043385661]
We generate graphic layouts that can flexibly incorporate design semantics, either specified implicitly or explicitly by a user.
Our approach builds on a generative layout model based on a Transformer architecture, and formulates the layout generation as a constrained optimization problem.
We show in the experiments that our approach is capable of generating realistic layouts in both constrained and unconstrained generation tasks with a single model.
arXiv Detail & Related papers (2021-08-02T13:04:11Z) - LayoutTransformer: Layout Generation and Completion with Self-attention [105.21138914859804]
We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents, and 3D objects.
We propose LayoutTransformer, a novel framework that leverages self-attention to learn contextual relationships between layout elements.
Our framework allows us to generate a new layout either from an empty set or from an initial seed set of primitives, and can easily scale to support an arbitrary of primitives per layout.
arXiv Detail & Related papers (2020-06-25T17:56:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.