Related papers: Sketch-to-Layout: Sketch-Guided Multimodal Layout Generation

Sketch-to-Layout: Sketch-Guided Multimodal Layout Generation

URL: http://arxiv.org/abs/2510.27632v1
Date: Fri, 31 Oct 2025 17:05:10 GMT
Title: Sketch-to-Layout: Sketch-Guided Multimodal Layout Generation
Authors: Riccardo Brioschi, Aleksandr Alekseev, Emanuele Nevali, Berkay Döner, Omar El Malki, Blagoj Mitrevski, Leandro Kieliger, Mark Collier, Andrii Maksai, Jesse Berent, Claudiu Musat, Efi Kokiopoulou,
Abstract summary: We introduce an innovative approach exploiting user-provided sketches as constraints.<n>To tackle the sketch-to- intuitive problem, we propose a multimodal transformer-based solution.<n>We release O(200k) synthetically-generated sketches for the public datasets above.
Score: 33.89285533035933
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graphic layout generation is a growing research area focusing on generating aesthetically pleasing layouts ranging from poster designs to documents. While recent research has explored ways to incorporate user constraints to guide the layout generation, these constraints often require complex specifications which reduce usability. We introduce an innovative approach exploiting user-provided sketches as intuitive constraints and we demonstrate empirically the effectiveness of this new guidance method, establishing the sketch-to-layout problem as a promising research direction, which is currently under-explored. To tackle the sketch-to-layout problem, we propose a multimodal transformer-based solution using the sketch and the content assets as inputs to produce high quality layouts. Since collecting sketch training data from human annotators to train our model is very costly, we introduce a novel and efficient method to synthetically generate training sketches at scale. We train and evaluate our model on three publicly available datasets: PubLayNet, DocLayNet and SlidesVQA, demonstrating that it outperforms state-of-the-art constraint-based methods, while offering a more intuitive design experience. In order to facilitate future sketch-to-layout research, we release O(200k) synthetically-generated sketches for the public datasets above. The datasets are available at https://github.com/google-deepmind/sketch_to_layout.

Related papers

SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation [6.39528707908268]
There continues to be a lack of large-scale paired datasets for scene sketches. We propose a self-supervised method for scene sketch generation that does not rely on any existing scene sketch. We contribute a large-scale dataset centered around scene sketches, comprising highly semantically consistent "text-sketch-image" triplets.
arXiv Detail & Related papers (2024-05-29T06:43:49Z)
Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models [81.6240188672294]
In graphic design, non-professional users often struggle to create visually appealing layouts due to limited skills and resources. We introduce a novel multimodal instruction-following framework for layout planning, allowing users to easily arrange visual elements into tailored layouts. Our method not only simplifies the design process for non-professionals but also surpasses the performance of few-shot GPT-4V models, with mIoU higher by 12% on Crello.
arXiv Detail & Related papers (2024-04-23T17:58:33Z)
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production. Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z)
I Know What You Draw: Learning Grasp Detection Conditioned on a Few Freehand Sketches [74.63313641583602]
We propose a method to generate a potential grasp configuration relevant to the sketch-depicted objects. Our model is trained and tested in an end-to-end manner which is easy to be implemented in real-world applications.
arXiv Detail & Related papers (2022-05-09T04:23:36Z)
SingleSketch2Mesh : Generating 3D Mesh model from Sketch [1.6973426830397942]
Current methods to generate 3D models from sketches are either manual or tightly coupled with 3D modeling platforms. We propose a novel AI based ensemble approach, SingleSketch2Mesh, for generating 3D models from hand-drawn sketches.
arXiv Detail & Related papers (2022-03-07T06:30:36Z)
FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context [112.07988211268612]
We advance sketch research to scenes with the first dataset of freehand scene sketches, FS-COCO. Our dataset comprises 10,000 freehand scene vector sketches with per point space-time information by 100 non-expert individuals. We study for the first time the problem of the fine-grained image retrieval from freehand scene sketches and sketch captions.
arXiv Detail & Related papers (2022-03-04T03:00:51Z)
Sketch2Model: View-Aware 3D Modeling from Single Free-Hand Sketches [4.781615891172263]
We investigate the problem of generating 3D meshes from single free-hand sketches, aiming at fast 3D modeling for novice users. We address the importance of viewpoint specification for overcoming ambiguities, and propose a novel view-aware generation approach.
arXiv Detail & Related papers (2021-05-14T06:27:48Z)
Deep Self-Supervised Representation Learning for Free-Hand Sketch [51.101565480583304]
We tackle the problem of self-supervised representation learning for free-hand sketches. Key for the success of our self-supervised learning paradigm lies with our sketch-specific designs. We show that the proposed approach outperforms the state-of-the-art unsupervised representation learning methods.
arXiv Detail & Related papers (2020-02-03T16:28:29Z)
SketchDesc: Learning Local Sketch Descriptors for Multi-view Correspondence [68.63311821718416]
We study the problem of multi-view sketch correspondence, where we take as input multiple freehand sketches with different views of the same object. This problem is challenging since the visual features of corresponding points at different views can be very different. We take a deep learning approach and learn a novel local sketch descriptor from data.
arXiv Detail & Related papers (2020-01-16T11:31:21Z)
Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn Sketches [133.01690754567252]
Sketch-based image editing aims to synthesize and modify photos based on the structural information provided by the human-drawn sketches. Deep Plastic Surgery is a novel, robust and controllable image editing framework that allows users to interactively edit images using hand-drawn sketch inputs.
arXiv Detail & Related papers (2020-01-09T08:57:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.