DesignSense: A Human Preference Dataset and Reward Modeling Framework for Graphic Layout Generation
- URL: http://arxiv.org/abs/2602.23438v1
- Date: Thu, 26 Feb 2026 19:03:05 GMT
- Title: DesignSense: A Human Preference Dataset and Reward Modeling Framework for Graphic Layout Generation
- Authors: Varun Gopal, Rishabh Jain, Aradhya Mathur, Nikitha SR, Sohan Patnaik, Sudhir Yarram, Mayur Hemani, Balaji Krishnamurthy, Mausoom Sarkar,
- Abstract summary: We introduce DesignSense-10k, a large-scale dataset of 10,235 human-annotated preference pairs for graphic layout evaluation.<n>We use semantic grouping, layout prediction, filtering, clustering, and VLM-based refinement to produce high-quality comparison pairs.<n>Our analysis shows that frontier VLMs remain unreliable overall and fail catastrophically on the full four-class task.
- Score: 23.876231579797803
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Graphic layouts serve as an important and engaging medium for visual communication across different channels. While recent layout generation models have demonstrated impressive capabilities, they frequently fail to align with nuanced human aesthetic judgment. Existing preference datasets and reward models trained on text-to-image generation do not generalize to layout evaluation, where the spatial arrangement of identical elements determines quality. To address this critical gap, we introduce DesignSense-10k, a large-scale dataset of 10,235 human-annotated preference pairs for graphic layout evaluation. We propose a five-stage curation pipeline that generates visually coherent layout transformations across diverse aspect ratios, using semantic grouping, layout prediction, filtering, clustering, and VLM-based refinement to produce high-quality comparison pairs. Human preferences are annotated using a 4-class scheme (left, right, both good, both bad) to capture subjective ambiguity. Leveraging this dataset, we train DesignSense, a vision-language model-based classifier that substantially outperforms existing open-source and proprietary models across comprehensive evaluation metrics (54.6% improvement in Macro F1 over the strongest proprietary baseline). Our analysis shows that frontier VLMs remain unreliable overall and fail catastrophically on the full four-class task, underscoring the need for specialized, preference-aware models. Beyond the dataset, our reward model DesignSense yields tangible downstream gains in layout generation. Using our judge during RL based training improves generator win rate by about 3%, while inference-time scaling, which involves generating multiple candidates and selecting the best one, provides a 3.6% improvement. These results highlight the practical impact of specialized, layout-aware preference modeling on real-world layout generation quality.
Related papers
- Follow-Your-Preference: Towards Preference-Aligned Image Inpainting [17.648992293002088]
We leverage the prominent direct preference optimization approach for alignment training.<n>We employ public reward models to construct preference training datasets.<n>Our work can set a simple yet solid baseline, pushing this promising frontier.
arXiv Detail & Related papers (2025-09-27T03:32:30Z) - MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing [117.58619053719251]
MinerU2.5 is a document parsing model that achieves state-of-the-art recognition accuracy while maintaining exceptional computational efficiency.<n>Our approach employs a coarse-to-fine, two-stage parsing strategy that decouples global layout analysis from local content recognition.
arXiv Detail & Related papers (2025-09-26T10:45:48Z) - Anyprefer: An Agentic Framework for Preference Data Synthesis [62.3856754548222]
We propose Anyprefer, a framework designed to synthesize high-quality preference data for aligning the target model.<n> external tools are introduced to assist the judge model in accurately rewarding the target model's responses.<n>The synthesized data is compiled into a new preference dataset, Anyprefer-V1, consisting of 58K high-quality preference pairs.
arXiv Detail & Related papers (2025-04-27T15:21:59Z) - Sentiment Analysis in Software Engineering: Evaluating Generative Pre-trained Transformers [0.0]
This study systematically evaluates the performance of bidirectional transformers, such as BERT, against generative pre-trained transformers, specifically GPT-4o-mini, in SE sentiment analysis.<n>The results reveal that fine-tuned GPT-4o-mini performs comparable to BERT and other bidirectional models on structured and balanced datasets like GitHub and Jira.<n>On linguistically complex datasets with imbalanced sentiment distributions, such as Stack Overflow, the default GPT-4o-mini model exhibits superior generalization, achieving an accuracy of 85.3% compared to the fine-tuned model's 13.1%.
arXiv Detail & Related papers (2025-04-22T14:19:25Z) - MM-RLHF: The Next Step Forward in Multimodal LLM Alignment [59.536850459059856]
We introduce MM-RLHF, a dataset containing $mathbf120k$ fine-grained, human-annotated preference comparison pairs.<n>We propose several key innovations to improve the quality of reward models and the efficiency of alignment algorithms.<n>Our approach is rigorously evaluated across $mathbf10$ distinct dimensions and $mathbf27$ benchmarks.
arXiv Detail & Related papers (2025-02-14T18:59:51Z) - SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.<n>Current state-of-the-art methods focus on training innovative architectural designs on confined datasets.<n>We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z) - Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment [51.14207112118503]
We introduce preference embedding, an approach that embeds responses into a latent space to capture preferences efficiently.<n>We also propose preference score-based General Preference Optimization (GPO), which generalizes reward-based reinforcement learning from human feedback.<n>Our method may enhance the alignment of foundation models with nuanced human values.
arXiv Detail & Related papers (2024-10-03T04:22:55Z) - Dividable Configuration Performance Learning [4.949726352498762]
We propose a model-agnostic and sparsity-robust framework for predicting configuration performance, dubbed DaL.
DaL is based on the new paradigm of dividable learning that builds a model via "divide-and-learn"
arXiv Detail & Related papers (2024-09-11T21:23:23Z) - Implicitly Guided Design with PropEn: Match your Data to Follow the Gradient [52.2669490431145]
PropEn is inspired by'matching', which enables implicit guidance without training a discriminator.
We show that training with a matched dataset approximates the gradient of the property of interest while remaining within the data distribution.
arXiv Detail & Related papers (2024-05-28T11:30:19Z) - LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication.
Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production.
Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.