Related papers: OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps

OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps

URL: http://arxiv.org/abs/2509.19282v1
Date: Tue, 23 Sep 2025 17:50:00 GMT
Title: OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps
Authors: Bingnan Li, Chen-Yu Wang, Haiyang Xu, Xiang Zhang, Ethan Armand, Divyansh Srivastava, Xiaojun Shan, Zeyuan Chen, Jianwen Xie, Zhuowen Tu,
Abstract summary: We identify two primary challenges: large overlapping regions and overlapping instances with minimal semantic distinction.<n>We introduce OverLayScore, a novel metric that quantifies the complexity of overlapping bounding boxes.<n>We present Creati-AM, a benchmark featuring high-quality annotations and a balanced distribution across different levels of OverLayScore.
Score: 43.782757481408076
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite steady progress in layout-to-image generation, current methods still struggle with layouts containing significant overlap between bounding boxes. We identify two primary challenges: (1) large overlapping regions and (2) overlapping instances with minimal semantic distinction. Through both qualitative examples and quantitative analysis, we demonstrate how these factors degrade generation quality. To systematically assess this issue, we introduce OverLayScore, a novel metric that quantifies the complexity of overlapping bounding boxes. Our analysis reveals that existing benchmarks are biased toward simpler cases with low OverLayScore values, limiting their effectiveness in evaluating model performance under more challenging conditions. To bridge this gap, we present OverLayBench, a new benchmark featuring high-quality annotations and a balanced distribution across different levels of OverLayScore. As an initial step toward improving performance on complex overlaps, we also propose CreatiLayout-AM, a model fine-tuned on a curated amodal mask dataset. Together, our contributions lay the groundwork for more robust layout-to-image generation under realistic and challenging scenarios. Project link: https://mlpc-ucsd.github.io/OverLayBench.

Related papers

MR$^2$-Bench: Going Beyond Matching to Reasoning in Multimodal Retrieval [86.35779264575154]
Multimodal retrieval is becoming a crucial component of modern AI applications, yet its evaluation lags behind the demands of more realistic and challenging scenarios.<n>We introduce MR$2$-Bench, a reasoning-intensive benchmark for multimodal retrieval.
arXiv Detail & Related papers (2025-09-30T15:09:14Z)
Saccadic Vision for Fine-Grained Visual Classification [10.681604440788854]
Fine-grained visual classification (FGVC) requires distinguishing between visually similar categories through subtle, localized features.<n>Existing part-based methods rely on complex localization networks that learn mappings from pixel to sample space.<n>We propose a two-stage process that first extracts peripheral features and generates a sample map.<n>We employ contextualized selective attention to weigh the impact of each fixation patch before fusing peripheral and focus representations.
arXiv Detail & Related papers (2025-09-19T07:03:37Z)
7Bench: a Comprehensive Benchmark for Layout-guided Text-to-image Models [3.8123588214292745]
We introduce 7Bench, the first benchmark to assess both semantic and spatial alignment in layout-guided text-to-image generation.<n>We propose an evaluation protocol that builds on existing frameworks by incorporating the layout alignment score to assess spatial accuracy.
arXiv Detail & Related papers (2025-08-18T13:37:51Z)
CountLoop: Training-Free High-Instance Image Generation via Iterative Agent Guidance [47.59187786346473]
We present CountLoop, a training-free framework that provides diffusion models with accurate instance control.<n>Experiments on COCO Count, T2I CompBench, and two new high-instance benchmarks show that CountLoop achieves counting accuracy of up to 98%.
arXiv Detail & Related papers (2025-08-18T11:28:02Z)
CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming [58.48683464644606]
We introduce CPRet, a retrieval-oriented benchmark suite for competitive programming.<n>Our contribution includes both high-quality training data and temporally separated test sets for reliable evaluation.<n>We develop two task-specialized retrievers based on this dataset: CPRetriever-Code, trained with a novel Group-InfoNCE loss for problem-code alignment, and CPRetriever-Prob, fine-tuned for identifying problem-level similarity.
arXiv Detail & Related papers (2025-05-19T10:07:51Z)
DivCon: Divide and Conquer for Complex Numerical and Spatial Reasoning in Text-to-Image Generation [0.0]
Diffusion-driven text-to-image (T2I) generation has achieved remarkable advancements in recent years.<n> layout is employed as an intermedium to bridge large language models and layout-based diffusion models.<n>We introduce a divide-and-conquer approach which decouples the generation task into multiple subtasks.
arXiv Detail & Related papers (2024-03-11T03:24:44Z)
A simple, strong baseline for building damage detection on the xBD dataset [2.7163621600184773]
We construct a strong baseline method for building damage detection by starting with the highly-winning solution of the xView2 competition. We expect the simplified solution to be more widely and easily applicable. We find that both the complex and the simplified model fail to generalize to unseen locations.
arXiv Detail & Related papers (2024-01-30T18:59:56Z)
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation [147.81509219686419]
We propose a diagnostic benchmark for layout-guided image generation that examines four categories of spatial control skills: number, position, size, and shape. Next, we propose IterInpaint, a new baseline that generates foreground and background regions step-by-step via inpainting. We show comprehensive ablation studies on IterInpaint, including training task ratio, crop&paste vs. repaint, and generation order.
arXiv Detail & Related papers (2023-04-13T16:58:33Z)
Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation [51.59290734837372]
We propose a conceptually simple yet effective post-processing refinement framework to improve the boundary quality. The proposed BPR framework yields significant improvements over the Mask R-CNN baseline on Cityscapes benchmark. By applying the BPR framework to the PolyTransform + SegFix baseline, we reached 1st place on the Cityscapes leaderboard.
arXiv Detail & Related papers (2021-04-12T07:10:48Z)
1st Place Solutions for OpenImage2019 -- Object Detection and Instance Segmentation [116.25081559037872]
This article introduces the solutions of the two champion teams, MMfruit' for the detection track and MMfruitSeg' for the segmentation track, in OpenImage Challenge 2019. It is commonly known that for an object detector, the shared feature at the end of the backbone is not appropriate for both classification and regression. We propose the Decoupling Head (DH) to disentangle the object classification and regression via the self-learned optimal feature extraction.
arXiv Detail & Related papers (2020-03-17T06:45:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.