Benchmarking Segmentation Models with Mask-Preserved Attribute Editing
- URL: http://arxiv.org/abs/2403.01231v2
- Date: Sun, 10 Mar 2024 15:12:35 GMT
- Title: Benchmarking Segmentation Models with Mask-Preserved Attribute Editing
- Authors: Zijin Yin, Kongming Liang, Bing Li, Zhanyu Ma, Jun Guo
- Abstract summary: We investigate both local and global attribute variations for robustness evaluation.
To achieve this, we construct a mask-preserved attribute editing pipeline to edit visual attributes of real images.
Using our pipeline, we construct a benchmark covering both object and image attributes.
- Score: 25.052698108262838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When deploying segmentation models in practice, it is critical to evaluate
their behaviors in varied and complex scenes. Different from the previous
evaluation paradigms only in consideration of global attribute variations (e.g.
adverse weather), we investigate both local and global attribute variations for
robustness evaluation. To achieve this, we construct a mask-preserved attribute
editing pipeline to edit visual attributes of real images with precise control
of structural information. Therefore, the original segmentation labels can be
reused for the edited images. Using our pipeline, we construct a benchmark
covering both object and image attributes (e.g. color, material, pattern,
style). We evaluate a broad variety of semantic segmentation models, spanning
from conventional close-set models to recent open-vocabulary large models on
their robustness to different types of variations. We find that both local and
global attribute variations affect segmentation performances, and the
sensitivity of models diverges across different variation types. We argue that
local attributes have the same importance as global attributes, and should be
considered in the robustness evaluation of segmentation models. Code:
https://github.com/PRIS-CV/Pascal-EA.
Related papers
- Benchmarking Semantic Segmentation Models via Appearance and Geometry Attribute Editing [45.359144639209205]
We construct an automatic data generation pipeline Gen4Seg to stress-test semantic segmentation models.<n>We benchmark a wide variety of semantic segmentation models, spanning from closed-set models to open-vocabulary large models.<n>Our work suggests the potential of generative models as effective tools for automatically analyzing segmentation models.
arXiv Detail & Related papers (2026-03-02T07:05:37Z) - HMGIE: Hierarchical and Multi-Grained Inconsistency Evaluation for Vision-Language Data Cleansing [54.970275599061594]
We design an adaptive evaluation framework, called Hierarchical and Multi-Grained Inconsistency Evaluation (HMGIE)
HMGIE can provide multi-grained evaluations covering both accuracy and completeness for various image-caption pairs.
To verify the efficacy and flexibility of the proposed framework, we construct MVTID, an image-caption dataset with diverse types and granularities of inconsistencies.
arXiv Detail & Related papers (2024-12-07T15:47:49Z) - Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts [56.57141696245328]
In open-world scenarios, where both novel classes and domains may exist, an ideal segmentation model should detect anomaly classes for safety.
Existing methods often struggle to distinguish between domain-level and semantic-level distribution shifts.
arXiv Detail & Related papers (2024-11-06T11:03:02Z) - Grounding Everything: Emerging Localization Properties in
Vision-Language Transformers [51.260510447308306]
We show that pretrained vision-language (VL) models allow for zero-shot open-vocabulary object localization without any fine-tuning.
We propose a Grounding Everything Module (GEM) that generalizes the idea of value-value attention introduced by CLIPSurgery to a self-self attention path.
We evaluate the proposed GEM framework on various benchmark tasks and datasets for semantic segmentation.
arXiv Detail & Related papers (2023-12-01T19:06:12Z) - Attribute Localization and Revision Network for Zero-Shot Learning [13.530912616208722]
Zero-shot learning enables the model to recognize unseen categories with the aid of auxiliary semantic information such as attributes.
In this paper, we find that the choice between local and global features is not a zero-sum game, global features can also contribute to the understanding of attributes.
arXiv Detail & Related papers (2023-10-11T14:50:52Z) - Conditional Cross Attention Network for Multi-Space Embedding without
Entanglement in Only a SINGLE Network [1.8899300124593648]
We propose a Conditional Cross-Attention Network that induces disentangled multi-space embeddings for various specific attributes with only a single backbone.
Our proposed method achieved consistent state-of-the-art performance on the FashionAI, DARN, DeepFashion, and Zappos50K benchmark datasets.
arXiv Detail & Related papers (2023-07-25T04:48:03Z) - Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion
Image Manipulation [27.587905673112473]
Fashion attribute editing is a task that aims to convert the semantic attributes of a given fashion image while preserving the irrelevant regions.
Previous works typically employ conditional GANs where the generator explicitly learns the target attributes and directly execute the conversion.
We explore the classifier-guided diffusion that leverages the off-the-shelf diffusion model pretrained on general visual semantics such as Imagenet.
arXiv Detail & Related papers (2022-10-12T02:21:18Z) - Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks.
We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z) - A Comprehensive Study of Image Classification Model Sensitivity to
Foregrounds, Backgrounds, and Visual Attributes [58.633364000258645]
We call this dataset RIVAL10 consisting of roughly $26k$ instances over $10$ classes.
We evaluate the sensitivity of a broad set of models to noise corruptions in foregrounds, backgrounds and attributes.
In our analysis, we consider diverse state-of-the-art architectures (ResNets, Transformers) and training procedures (CLIP, SimCLR, DeiT, Adversarial Training)
arXiv Detail & Related papers (2022-01-26T06:31:28Z) - Context-Conditional Adaptation for Recognizing Unseen Classes in Unseen
Domains [48.17225008334873]
We propose a feature generative framework integrated with a COntext COnditional Adaptive (COCOA) Batch-Normalization.
The generated visual features better capture the underlying data distribution enabling us to generalize to unseen classes and domains at test-time.
We thoroughly evaluate and analyse our approach on established large-scale benchmark - DomainNet.
arXiv Detail & Related papers (2021-07-15T17:51:16Z) - Explaining in Style: Training a GAN to explain a classifier in
StyleSpace [75.75927763429745]
We present StylEx, a method for training a generative model to explain semantic attributes of an image.
StylEx finds attributes that align well with semantic ones, generate meaningful image-specific explanations, and are human-interpretable.
Our results show that the method finds attributes that align well with semantic ones, generate meaningful image-specific explanations, and are human-interpretable.
arXiv Detail & Related papers (2021-04-27T17:57:19Z) - Importance of Self-Consistency in Active Learning for Semantic
Segmentation [31.392212891018655]
We show that self-consistency can be a powerful source of self-supervision to improve the performance of a data-driven model with access to only a small amount of labeled data.
In our proposed active learning framework, we iteratively extract small image patches that need to be labeled.
We are able to find the image patches over which the current model struggles the most to classify.
arXiv Detail & Related papers (2020-08-04T22:18:35Z) - Unsupervised segmentation via semantic-apparent feature fusion [21.75371777263847]
This research proposes an unsupervised foreground segmentation method based on semantic-apparent feature fusion (SAFF)
Key regions of foreground object can be accurately responded via semantic features, while apparent features provide richer detailed expression.
By fusing semantic and apparent features, as well as cascading the modules of intra-image adaptive feature weight learning and inter-image common feature learning, the research achieves performance that significantly exceeds baselines.
arXiv Detail & Related papers (2020-05-21T08:28:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.