Approximate Size Targets Are Sufficient for Accurate Semantic Segmentation
- URL: http://arxiv.org/abs/2503.06954v1
- Date: Mon, 10 Mar 2025 06:02:13 GMT
- Title: Approximate Size Targets Are Sufficient for Accurate Semantic Segmentation
- Authors: Xingye Fan, Zhongwen, Zhang, Yuri Boykov,
- Abstract summary: Extending binary class tags to approximate relative object-size distributions allows off-the-shelf architectures to solve the segmentation problem.<n>A straightforward zero-avoiding KL-divergence loss for average predictions produces segmentation accuracy comparable to the standard pixel-precise supervision.<n>Our ideas are validated on PASCAL VOC using our new human annotations of approximate object sizes.
- Score: 52.239136918460616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper demonstrates a surprising result for segmentation with image-level targets: extending binary class tags to approximate relative object-size distributions allows off-the-shelf architectures to solve the segmentation problem. A straightforward zero-avoiding KL-divergence loss for average predictions produces segmentation accuracy comparable to the standard pixel-precise supervision with full ground truth masks. In contrast, current results based on class tags typically require complex non-reproducible architectural modifications and specialized multi-stage training procedures. Our ideas are validated on PASCAL VOC using our new human annotations of approximate object sizes. We also show the results on COCO and medical data using synthetically corrupted size targets. All standard networks demonstrate robustness to the size targets' errors. For some classes, the validation accuracy is significantly better than the pixel-level supervision; the latter is not robust to errors in the masks. Our work provides new ideas and insights on image-level supervision in segmentation and may encourage other simple general solutions to the problem.
Related papers
- ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation [6.012828781329036]
We propose to explicitly model and rectify the bias existing in CLIP to facilitate the unsupervised semantic segmentation task.<n>Specifically, we design a learnable "Reference" prompt to encode class-preference bias and a projection of the positional embedding in the vision transformer to encode space-preference bias.<n>Our method performs favorably against previous state-of-the-arts.
arXiv Detail & Related papers (2024-08-13T09:10:48Z) - Revisiting Evaluation Metrics for Semantic Segmentation: Optimization
and Evaluation of Fine-grained Intersection over Union [113.20223082664681]
We propose the use of fine-grained mIoUs along with corresponding worst-case metrics.
These fine-grained metrics offer less bias towards large objects, richer statistical information, and valuable insights into model and dataset auditing.
Our benchmark study highlights the necessity of not basing evaluations on a single metric and confirms that fine-grained mIoUs reduce the bias towards large objects.
arXiv Detail & Related papers (2023-10-30T03:45:15Z) - Long-tail Detection with Effective Class-Margins [4.18804572788063]
We show how the commonly used mean average precision evaluation metric on an unknown test set is bound by a margin-based binary classification error.
We optimize margin-based binary classification error with a novel surrogate objective called text-Effective Class-Margin Loss (ECM)
arXiv Detail & Related papers (2023-01-23T21:25:24Z) - CAR: Class-aware Regularizations for Semantic Segmentation [20.947897583427192]
We propose a universal Class-Aware Regularization (CAR) approach to optimize the intra-class variance and inter-class distance during feature learning.
Our method can be easily applied to most existing segmentation models during training, including OCR and CPNet.
arXiv Detail & Related papers (2022-03-14T15:02:48Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z) - Exposing Semantic Segmentation Failures via Maximum Discrepancy
Competition [102.75463782627791]
We take steps toward answering the question by exposing failures of existing semantic segmentation methods in the open visual world.
Inspired by previous research on model falsification, we start from an arbitrarily large image set, and automatically sample a small image set by MAximizing the Discrepancy (MAD) between two segmentation methods.
The selected images have the greatest potential in falsifying either (or both) of the two methods.
A segmentation method, whose failures are more difficult to be exposed in the MAD competition, is considered better.
arXiv Detail & Related papers (2021-02-27T16:06:25Z) - Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals [78.12377360145078]
We introduce a novel two-step framework that adopts a predetermined prior in a contrastive optimization objective to learn pixel embeddings.
This marks a large deviation from existing works that relied on proxy tasks or end-to-end clustering.
In particular, when fine-tuning the learned representations using just 1% of labeled examples on PASCAL, we outperform supervised ImageNet pre-training by 7.1% mIoU.
arXiv Detail & Related papers (2021-02-11T18:54:47Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - A Weakly-Supervised Semantic Segmentation Approach based on the Centroid
Loss: Application to Quality Control and Inspection [6.101839518775968]
We propose and assess a new weakly-supervised semantic segmentation approach making use of a novel loss function.
The performance of the approach is evaluated against datasets from two different industry-related case studies.
arXiv Detail & Related papers (2020-10-26T09:08:21Z) - AinnoSeg: Panoramic Segmentation with High Perfomance [4.867465475957119]
Current panoramic segmentation algorithms are more concerned with context semantics, but the details of image are not processed enough.
Aiming to address these issues, this paper presents some useful tricks.
All these operations named AinnoSeg, AinnoSeg can achieve state-of-art performance on the well-known dataset ADE20K.
arXiv Detail & Related papers (2020-07-21T04:16:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.