Related papers: Revision Matters: Generative Design Guided by Revision Edits

Revision Matters: Generative Design Guided by Revision Edits

URL: http://arxiv.org/abs/2406.18559v1
Date: Mon, 27 May 2024 17:54:51 GMT
Title: Revision Matters: Generative Design Guided by Revision Edits
Authors: Tao Li, Chin-Yi Cheng, Amber Xie, Gang Li, Yang Li,
Abstract summary: We investigate how revision edits from human designers can benefit a multimodal generative model. Our results show that human revision plays a critical role in iterative layout refinement. Our work paves the way for iterative design revision based on pre-trained large multimodal models.
Score: 18.976709992275286
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Layout design, such as user interface or graphical layout in general, is fundamentally an iterative revision process. Through revising a design repeatedly, the designer converges on an ideal layout. In this paper, we investigate how revision edits from human designer can benefit a multimodal generative model. To do so, we curate an expert dataset that traces how human designers iteratively edit and improve a layout generation with a prompted language goal. Based on such data, we explore various supervised fine-tuning task setups on top of a Gemini multimodal backbone, a large multimodal model. Our results show that human revision plays a critical role in iterative layout refinement. While being noisy, expert revision edits lead our model to a surprisingly strong design FID score ~10 which is close to human performance (~6). In contrast, self-revisions that fully rely on model's own judgement, lead to an echo chamber that prevents iterative improvement, and sometimes leads to generative degradation. Fortunately, we found that providing human guidance plays at early stage plays a critical role in final generation. In such human-in-the-loop scenario, our work paves the way for iterative design revision based on pre-trained large multimodal models.

Related papers

DesignLab: Designing Slides Through Iterative Detection and Correction [32.29179295296639]
We propose DesignLab, which separates the design process into two roles, the design reviewer and the design contributor.<n>This decomposition enables an iterative loop where the reviewer continuously detects issues and the contributor corrects them.<n>Our experiments show that DesignLab outperforms existing design-generation methods, including a commercial tool.
arXiv Detail & Related papers (2025-07-23T04:49:48Z)
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models [88.398085358514]
DICE is a model designed to detect localized differences between the original and the edited image.<n>It is trained using a strategy that leverages self-supervision, distillation from inpainting networks, and full supervision.<n>We demonstrate that DICE effectively identifies coherent edits, effectively evaluating images generated by different editing models with a strong correlation with human judgment.
arXiv Detail & Related papers (2025-05-26T18:00:10Z)
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing [84.16442052968615]
We introduce RISEBench, the first benchmark for evaluating Reasoning-Informed viSual Editing (RISE)<n>RISEBench focuses on four key reasoning categories: Temporal, Causal, Spatial, and Logical Reasoning.<n>We conduct experiments evaluating nine prominent visual editing models, comprising both open-source and proprietary models.
arXiv Detail & Related papers (2025-04-03T17:59:56Z)
MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration [63.31211701741323]
We extend multi-agent multi-model reasoning to generation, specifically to improving faithfulness through refinement. We design intrinsic evaluations for each subtask, with our findings indicating that both multi-agent (multiple instances) and multi-model (diverse LLM types) approaches benefit error detection and critiquing. We consolidate these insights into a final "recipe" called Multi-Agent Multi-Model Refinement (MAMM-Refine), where multi-agent and multi-model collaboration significantly boosts performance.
arXiv Detail & Related papers (2025-03-19T14:46:53Z)
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing [59.590505989071175]
Text-to-Image (T2I) diffusion models have shown impressive results in generating visually compelling images following user prompts. We introduce UniVG, a generalist diffusion model capable of supporting a diverse range of image generation tasks with a single set of weights.
arXiv Detail & Related papers (2025-03-16T21:11:25Z)
Enhancing Recommendation Explanations through User-Centric Refinement [7.640281193938638]
We propose a novel paradigm that refines initial explanations generated by existing explainable recommender models. Specifically, we introduce a multi-agent collaborative refinement framework based on large language models.
arXiv Detail & Related papers (2025-02-17T12:08:18Z)
DiffDesign: Controllable Diffusion with Meta Prior for Efficient Interior Design Generation [25.532400438564334]
We propose DiffDesign, a controllable diffusion model with meta priors for efficient interior design generation. Specifically, we utilize the generative priors of a 2D diffusion model pre-trained on a large image dataset as our rendering backbone. We further guide the denoising process by disentangling cross-attention control over design attributes, such as appearance, pose, and size, and introduce an optimal transfer-based alignment module to enforce view consistency.
arXiv Detail & Related papers (2024-11-25T11:36:34Z)
Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling. Our research explores task-specific model pruning to inform decisions about designing SMoE architectures. We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z)
Design Editing for Offline Model-based Optimization [18.701760631151316]
offline model-based optimization (MBO) aims to maximize a black-box objective function using only an offline dataset of designs and scores. A common approach involves training a surrogate model using existing designs and their corresponding scores, and then generating new designs through gradient-based updates with respect to the surrogate model. This method suffers from the out-of-distribution issue, where the surrogate model may erroneously predict high scores for unseen designs. We introduce noise to the pseudo design candidates and subsequently denoise them with a diffusion prior trained on the offline dataset.
arXiv Detail & Related papers (2024-05-22T20:00:19Z)
Leveraging Human Revisions for Improving Text-to-Layout Models [16.617352120973806]
We propose using nuanced feedback through the form of human revisions for stronger alignment. Our method, Revision-Aware Reward Models, allows a generative text-to- text model to produce more modern, designer-aligned layouts.
arXiv Detail & Related papers (2024-05-16T01:33:09Z)
Representation Learning for Sequential Volumetric Design Tasks [11.702880690338677]
We propose to encode the design knowledge from a collection of expert or high-performing design sequences. We develop the preference model by estimating the density of the learned representations. We train an autoregressive transformer model for sequential design generation.
arXiv Detail & Related papers (2023-09-05T21:21:06Z)
Edit at your own risk: evaluating the robustness of edited models to distribution shifts [0.0]
We investigate how model editing affects the general robustness of a model, as well as the robustness of the specific behavior targeted by the edit. We find that edits tend to reduce general robustness, but that the degree of degradation depends on the editing algorithm and layers chosen. Motivated by these observations we introduce a new model editing algorithm, 1-layer (1-LI), which uses weight-space to navigate the trade-off between editing task accuracy and general robustness.
arXiv Detail & Related papers (2023-02-28T19:41:37Z)
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production. Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z)
Composing Ensembles of Pre-trained Models via Iterative Consensus [95.10641301155232]
We propose a unified framework for composing ensembles of different pre-trained models. We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization. We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer.
arXiv Detail & Related papers (2022-10-20T18:46:31Z)
EditEval: An Instruction-Based Benchmark for Text Improvements [73.5918084416016]
This work presents EditEval: An instruction-based, benchmark and evaluation suite for automatic evaluation of editing capabilities. We evaluate several pre-trained models, which shows that InstructGPT and PEER perform the best, but that most baselines fall below the supervised SOTA. Our analysis shows that commonly used metrics for editing tasks do not always correlate well, and that optimization for prompts with the highest performance does not necessarily entail the strongest robustness to different models.
arXiv Detail & Related papers (2022-09-27T12:26:05Z)
Aspect-Controllable Opinion Summarization [58.5308638148329]
We propose an approach that allows the generation of customized summaries based on aspect queries. Using a review corpus, we create a synthetic training dataset of (review, summary) pairs enriched with aspect controllers. We fine-tune a pretrained model using our synthetic dataset and generate aspect-specific summaries by modifying the aspect controllers.
arXiv Detail & Related papers (2021-09-07T16:09:17Z)
Test-Time Personalization with a Transformer for Human Pose Estimation [10.776892578762721]
We adapt our pose estimator during test time to exploit person-specific information. We show significant improvements on pose estimations with our self-supervised personalization.
arXiv Detail & Related papers (2021-07-05T16:48:34Z)
Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling [81.33107307509718]
We propose a topic adaptive storyteller to model the ability of inter-topic generalization. We also propose a prototype encoding structure to model the ability of intra-topic derivation. Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model.
arXiv Detail & Related papers (2020-08-11T03:55:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.