Benchmarking Semantic Segmentation Models via Appearance and Geometry Attribute Editing
- URL: http://arxiv.org/abs/2603.01535v1
- Date: Mon, 02 Mar 2026 07:05:37 GMT
- Title: Benchmarking Semantic Segmentation Models via Appearance and Geometry Attribute Editing
- Authors: Zijin Yin, Bing Li, Kongming Liang, Hao Sun, Zhongjiang He, Zhanyu Ma, Jun Guo,
- Abstract summary: We construct an automatic data generation pipeline Gen4Seg to stress-test semantic segmentation models.<n>We benchmark a wide variety of semantic segmentation models, spanning from closed-set models to open-vocabulary large models.<n>Our work suggests the potential of generative models as effective tools for automatically analyzing segmentation models.
- Score: 45.359144639209205
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic segmentation takes pivotal roles in various applications such as autonomous driving and medical image analysis. When deploying segmentation models in practice, it is critical to test their behaviors in varied and complex scenes in advance. In this paper, we construct an automatic data generation pipeline Gen4Seg to stress-test semantic segmentation models by generating various challenging samples with different attribute changes. Beyond previous evaluation paradigms focusing solely on global weather and style transfer, we investigate variations in both appearance and geometry attributes at the object and image level. These include object color, material, size, position, as well as image-level variations such as weather and style. To achieve this, we propose to edit visual attributes of existing real images with precise control of structural information, empowered by diffusion models. In this way, the existing segmentation labels can be reused for the edited images, which greatly reduces the labor costs. Using our pipeline, we construct two new benchmarks, Pascal-EA and COCO-EA. We benchmark a wide variety of semantic segmentation models, spanning from closed-set models to open-vocabulary large models. We have several key findings: 1) advanced open-vocabulary models do not exhibit greater robustness compared to closed-set methods under geometric variations; 2) data augmentation techniques, such as CutOut and CutMix, are limited in enhancing robustness against appearance variations; 3) our pipeline can also be employed as a data augmentation tool and improve both in-distribution and out-of-distribution performances. Our work suggests the potential of generative models as effective tools for automatically analyzing segmentation models, and we hope our findings will assist practitioners and researchers in developing more robust and reliable segmentation models.
Related papers
- How to Squeeze An Explanation Out of Your Model [13.154512864498912]
This paper proposes an approach for interpretability that is model-agnostic.<n>By including an SE block prior to the classification layer of any model, we are able to retrieve the most influential features.<n>Results show that this new SE-based interpretability can be applied to various models in image and video/multi-modal settings.
arXiv Detail & Related papers (2024-12-06T15:47:53Z) - Analyzing Deep Transformer Models for Time Series Forecasting via Manifold Learning [4.910937238451485]
Transformer models have consistently achieved remarkable results in various domains such as natural language processing and computer vision.
Despite ongoing research efforts to better understand these models, the field still lacks a comprehensive understanding.
Time series data, unlike image and text information, can be more challenging to interpret and analyze.
arXiv Detail & Related papers (2024-10-17T17:32:35Z) - A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.<n>Our approach enables versatile capabilities via different inference-time sampling schemes.<n>Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z) - Segmenting Object Affordances: Reproducibility and Sensitivity to Scale [27.277739855754447]
Methods re-use and adapt learning-based architectures for semantic segmentation to the affordance segmentation task.
We benchmark these methods under a reproducible setup on two single objects scenarios.
Our analysis shows that models are not robust to scale variations when object resolutions differ from those in the training set.
arXiv Detail & Related papers (2024-09-03T11:54:36Z) - Explore In-Context Segmentation via Latent Diffusion Models [132.26274147026854]
In-context segmentation aims to segment objects using given reference images.<n>Most existing approaches adopt metric learning or masked image modeling to build the correlation between visual prompts and input image queries.<n>This work approaches the problem from a fresh perspective - unlocking the capability of the latent diffusion model for in-context segmentation.
arXiv Detail & Related papers (2024-03-14T17:52:31Z) - Benchmarking Segmentation Models with Mask-Preserved Attribute Editing [25.052698108262838]
We investigate both local and global attribute variations for robustness evaluation.
To achieve this, we construct a mask-preserved attribute editing pipeline to edit visual attributes of real images.
Using our pipeline, we construct a benchmark covering both object and image attributes.
arXiv Detail & Related papers (2024-03-02T15:20:09Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - The Importance of Downstream Networks in Digital Pathology Foundation Models [1.689369173057502]
We evaluate seven feature extractor models across three different datasets with 162 different aggregation model configurations.
We find that the performance of many current feature extractor models is notably similar.
arXiv Detail & Related papers (2023-11-29T16:54:25Z) - SlimSeg: Slimmable Semantic Segmentation with Boundary Supervision [54.16430358203348]
We propose a simple but effective slimmable semantic segmentation (SlimSeg) method, which can be executed at different capacities during inference.
We show that our proposed SlimSeg with various mainstream networks can produce flexible models that provide dynamic adjustment of computational cost and better performance.
arXiv Detail & Related papers (2022-07-13T14:41:05Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.