Prior2Former -- Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation
- URL: http://arxiv.org/abs/2504.04841v1
- Date: Mon, 07 Apr 2025 08:53:14 GMT
- Title: Prior2Former -- Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation
- Authors: Sebastian Schmidt, Julius Körner, Dominik Fuchsgruber, Stefano Gasperini, Federico Tombari, Stephan Günnemann,
- Abstract summary: We propose Prior2Former (P2F) as the first approach for segmentation vision transformers rooted in evidential learning.<n>P2F extends the mask vision transformer architecture by incorporating a Beta prior for computing model uncertainty in pixel-wise binary mask assignments.<n>It achieves the highest ranking in the OoDIS anomaly instance benchmark among methods not using OOD data in any way.
- Score: 74.55677741919035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In panoptic segmentation, individual instances must be separated within semantic classes. As state-of-the-art methods rely on a pre-defined set of classes, they struggle with novel categories and out-of-distribution (OOD) data. This is particularly problematic in safety-critical applications, such as autonomous driving, where reliability in unseen scenarios is essential. We address the gap between outstanding benchmark performance and reliability by proposing Prior2Former (P2F), the first approach for segmentation vision transformers rooted in evidential learning. P2F extends the mask vision transformer architecture by incorporating a Beta prior for computing model uncertainty in pixel-wise binary mask assignments. This design enables high-quality uncertainty estimation that effectively detects novel and OOD objects enabling state-of-the-art anomaly instance segmentation and open-world panoptic segmentation. Unlike most segmentation models addressing unknown classes, P2F operates without access to OOD data samples or contrastive training on void (i.e., unlabeled) classes, making it highly applicable in real-world scenarios where such prior information is unavailable. Additionally, P2F can be flexibly applied to anomaly instance and panoptic segmentation. Through comprehensive experiments on the Cityscapes, COCO, SegmentMeIfYouCan, and OoDIS datasets, we demonstrate the state-of-the-art performance of P2F. It achieves the highest ranking in the OoDIS anomaly instance benchmark among methods not using OOD data in any way.
Related papers
- A Dataset for Semantic Segmentation in the Presence of Unknowns [49.795683850385956]
Existing datasets allow evaluation of only knowns or unknowns - but not both.<n>We propose a novel anomaly segmentation dataset, ISSU, that features a diverse set of anomaly inputs from cluttered real-world environments.<n>The dataset is twice larger than existing anomaly segmentation datasets.
arXiv Detail & Related papers (2025-03-28T10:31:01Z) - Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving [7.064497253920508]
Vision Foundation Models (VFMs) as feature extractors and density modeling techniques are proposed.<n>A comparison with state-of-the-art binary OOD classification methods reveals that VFM embeddings with density estimation outperform existing approaches in identifying OOD inputs.<n>Our method detects high-risk inputs likely to cause errors in downstream tasks, thereby improving overall performance.
arXiv Detail & Related papers (2025-01-14T12:51:34Z) - Multi-Scale Foreground-Background Confidence for Out-of-Distribution Segmentation [0.36832029288386137]
We present a multi-scale OOD segmentation method that exploits the confidence information of a foreground-background segmentation model.<n>We consider the per pixel confidence score of the model prediction which is close to 1 for a pixel in a foreground object.<n>By aggregating these confidence values for different sized patches, objects of various sizes can be identified in a single image.
arXiv Detail & Related papers (2024-12-22T12:09:27Z) - PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion [80.79938369319152]
We design a new pipeline coined PCF-Lift based on our Probabilis-tic Contrastive Fusion (PCF)
Our PCF-lift not only significantly outperforms the state-of-the-art methods on widely used benchmarks including the ScanNet dataset and the Messy Room dataset (4.4% improvement of scene-level PQ)
arXiv Detail & Related papers (2024-10-14T16:06:59Z) - Physically Feasible Semantic Segmentation [58.17907376475596]
State-of-the-art semantic segmentation models are typically optimized in a data-driven fashion, minimizing solely per-pixel or per-segment classification objectives on their training data.<n>This purely data-driven paradigm often leads to absurd segmentations, especially when the domain of input images is shifted from the one encountered during training.<n>Our method, Physically Feasible Semantic (PhyFea), first extracts explicit constraints that govern spatial class relations from the semantic segmentation training set at hand in an offline data-driven fashion, and then enforces a morphological yet differentiable loss that penalizes violations of these constraints during
arXiv Detail & Related papers (2024-08-26T22:39:08Z) - Pixel-wise Gradient Uncertainty for Convolutional Neural Networks
applied to Out-of-Distribution Segmentation [0.43512163406552007]
We present a method for obtaining uncertainty scores from pixel-wise loss gradients which can be computed efficiently during inference.
Our experiments show the ability of our method to identify wrong pixel classifications and to estimate prediction quality at negligible computational overhead.
arXiv Detail & Related papers (2023-03-13T08:37:59Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Anomaly-Aware Semantic Segmentation by Leveraging Synthetic-Unknown Data [19.80173687261055]
Anomaly awareness is essential for safety-critical applications such as autonomous driving.
We propose a novel Synthetic-Unknown Data Generation to tackle the anomaly-aware semantic segmentation task.
We reach the state-of-the-art performance on two anomaly segmentation datasets.
arXiv Detail & Related papers (2021-11-29T06:24:50Z) - Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA)
IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors.
IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.