Related papers: SoftPQ: Robust Instance Segmentation Evaluation via Soft Matching and Tunable Thresholds

SoftPQ: Robust Instance Segmentation Evaluation via Soft Matching and Tunable Thresholds

URL: http://arxiv.org/abs/2505.12155v2
Date: Tue, 27 May 2025 01:54:05 GMT
Title: SoftPQ: Robust Instance Segmentation Evaluation via Soft Matching and Tunable Thresholds
Authors: Ranit Karmakar, Simon F. Nørrelykke,
Abstract summary: We propose SoftPQ, a flexible and interpretable instance segmentation metric.<n>We show that SoftPQ captures meaningful differences in segmentation quality that existing metrics overlook.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Segmentation evaluation metrics traditionally rely on binary decision logic: predictions are either correct or incorrect, based on rigid IoU thresholds. Detection--based metrics such as F1 and mAP determine correctness at the object level using fixed overlap cutoffs, while overlap--based metrics like Intersection over Union (IoU) and Dice operate at the pixel level, often overlooking instance--level structure. Panoptic Quality (PQ) attempts to unify detection and segmentation assessment, but it remains dependent on hard-threshold matching--treating predictions below the threshold as entirely incorrect. This binary framing obscures important distinctions between qualitatively different errors and fails to reward gradual model improvements. We propose SoftPQ, a flexible and interpretable instance segmentation metric that redefines evaluation as a graded continuum rather than a binary classification. SoftPQ introduces tunable upper and lower IoU thresholds to define a partial matching region and applies a sublinear penalty function to ambiguous or fragmented predictions. These extensions allow SoftPQ to exhibit smoother score behavior, greater robustness to structural segmentation errors, and more informative feedback for model development and evaluation. Through controlled perturbation experiments, we show that SoftPQ captures meaningful differences in segmentation quality that existing metrics overlook, making it a practical and principled alternative for both benchmarking and iterative model refinement.

Related papers

GRCF: Two-Stage Groupwise Ranking and Calibration Framework for Multimodal Sentiment Analysis [20.77940776708036]
Pairwise ordinal learning frameworks capture relative order by learning from comparisons.<n>They assign uniform importance to all comparisons, failing to adaptively focus on hard-to-rank samples.<n>We propose a Two-Stage Group-wise Ranking and Framework (GRCF) that adapts the philosophy of Group Relative Policy Optimization.<n>GRCF achieves state-of-the-art performance on core regression benchmarks, while also showing strong generalizability in classification tasks.
arXiv Detail & Related papers (2026-01-14T16:26:44Z)
MS-ISSM: Objective Quality Assessment of Point Clouds Using Multi-scale Implicit Structural Similarity [65.85858856481131]
unstructured and irregular nature of point clouds poses a significant challenge for objective quality assessment (PCQA)<n>We propose the Multi-scale Implicit Structural Similarity Measurement (MS-ISSM)
arXiv Detail & Related papers (2026-01-03T14:58:52Z)
Calibratable Disambiguation Loss for Multi-Instance Partial-Label Learning [53.9713678229744]
Multi-instance partial-label learning (MIPL) is a weakly supervised framework that addresses the challenges of inexact supervision in both instance and label spaces.<n>Existing MIPL approaches often suffer from poor calibration, undermining reliability.<n>We propose a plug-and-play calibratable disambiguation loss (CDL) that simultaneously improves classification accuracy and calibration performance.
arXiv Detail & Related papers (2025-12-19T16:58:31Z)
The Multiclass Score-Oriented Loss (MultiSOL) on the Simplex [4.014524824655106]
In supervised binary classification, score-oriented losses have been introduced with the aim of optimizing a chosen performance metric directly during the training phase.<n>In this paper, we use a recently introduced multidimensional threshold-based classification framework to extend such score-oriented losses to multiclass classification.<n>As also demonstrated by several classification experiments, this proposed family of losses is designed to preserve the main advantages observed in the binary setting.
arXiv Detail & Related papers (2025-11-27T16:20:55Z)
Automated Skill Decomposition Meets Expert Ontologies: Bridging the Granularity Gap with LLMs [1.2891210250935148]
This paper investigates automated skill decomposition using Large Language Models (LLMs)<n>Our framework standardizes the pipeline from prompting and generation to normalization and alignment with ontology nodes.<n>To evaluate outputs, we introduce two metrics: a F1-score that uses optimal embedding-based matching to assess content accuracy, and a hierarchy-aware F1-score that credits structurally correct placements to assess granularity.
arXiv Detail & Related papers (2025-10-13T12:03:06Z)
Continual Action Quality Assessment via Adaptive Manifold-Aligned Graph Regularization [53.82400605816587]
Action Quality Assessment (AQA) quantifies human actions in videos, supporting applications in sports scoring, rehabilitation, and skill evaluation.<n>A major challenge lies in the non-stationary nature of quality distributions in real-world scenarios.<n>We introduce Continual AQA (CAQA), which equips with Continual Learning capabilities to handle evolving distributions.
arXiv Detail & Related papers (2025-10-08T10:09:47Z)
Unsupervised Conformal Inference: Bootstrapping and Alignment to Control LLM Uncertainty [49.19257648205146]
We propose an unsupervised conformal inference framework for generation.<n>Our gates achieve close-to-nominal coverage and provide tighter, more stable thresholds than split UCP.<n>The result is a label-free, API-compatible gate for test-time filtering.
arXiv Detail & Related papers (2025-09-26T23:40:47Z)
Prototype-Guided Pseudo-Labeling with Neighborhood-Aware Consistency for Unsupervised Adaptation [12.829638461740759]
In unsupervised adaptation for vision-language models such as CLIP, pseudo-labels from zero-shot predictions often exhibit significant noise.<n>We propose a novel adaptive pseudo-labeling framework that enhances CLIP's adaptation performance by integrating prototype consistency and neighborhood-based consistency.<n>Our method achieves state-of-the-art performance in unsupervised adaptation scenarios, delivering more accurate pseudo-labels while maintaining computational efficiency.
arXiv Detail & Related papers (2025-07-22T19:08:24Z)
SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection [70.23196257213829]
We propose a scalable and reliable Semantic-level Evaluation framework for Open domain Event detection.<n>Our proposed framework first constructs a scalable evaluation benchmark that currently includes 564 event types covering 7 major domains.<n>We then leverage large language models (LLMs) as automatic evaluation agents to compute a semantic F1-score, incorporating fine-grained definitions of semantically similar labels.
arXiv Detail & Related papers (2025-03-05T09:37:05Z)
Robust Confinement State Classification with Uncertainty Quantification through Ensembled Data-Driven Methods [39.27649013012046]
We develop methods for confinement state classification with uncertainty quantification and model robustness.<n>We focus on off-line analysis for TCV discharges, distinguishing L-mode, H-mode, and an in-between dithering phase (D)<n>A dataset of 302 TCV discharges is fully labeled, and will be publicly released.
arXiv Detail & Related papers (2025-02-24T18:25:22Z)
Improving action segmentation via explicit similarity measurement [5.303583360581161]
We propose an explicit similarity evaluation across frames and predictions to enhance the segmentation accuracy.<n>Our supervised learning architecture uses frame-level multi-resolution features as input to Transformer encoders.<n>We apply a newly proposed boundary correction algorithm that operates based on feature similarity between consecutive frames.<n>We also propose a fully unsupervised boundary detection-correction that identifies segment boundaries based solely on feature similarity without any training.
arXiv Detail & Related papers (2025-02-15T08:02:38Z)
PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z)
Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance. DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator. Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z)
Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score. Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score. Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z)
Self-Evaluation Improves Selective Generation in Large Language Models [54.003992911447696]
We reformulate open-ended generation tasks into token-level prediction tasks. We instruct an LLM to self-evaluate its answers. We benchmark a range of scoring methods based on self-evaluation.
arXiv Detail & Related papers (2023-12-14T19:09:22Z)
Threshold-Consistent Margin Loss for Open-World Deep Metric Learning [42.03620337000911]
Existing losses used in deep metric learning (DML) for image retrieval often lead to highly non-uniform intra-class and inter-class representation structures. Inconsistency often complicates the threshold selection process when deploying commercial image retrieval systems. We propose a novel variance-based metric called Operating-Point-Inconsistency-Score (OPIS) that quantifies the variance in the operating characteristics across classes.
arXiv Detail & Related papers (2023-07-08T21:16:41Z)
Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection [37.99031842449251]
Video anomaly detection under weak supervision presents significant challenges. We present a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability. Our approach significantly improves the detection accuracy of certain anomaly sub-classes, underscoring its practical value and efficacy.
arXiv Detail & Related papers (2023-06-26T06:45:16Z)
Concurrent Misclassification and Out-of-Distribution Detection for Semantic Segmentation via Energy-Based Normalizing Flow [0.0]
Recent semantic segmentation models accurately classify test-time examples that are similar to a training dataset distribution. We propose a generative model for concurrent in-distribution misclassification (IDM) and OOD detection that relies on a normalizing flow framework. FlowEneDet achieves promising results on Cityscapes, Cityscapes-C, FishyScapes and SegmentMeIfYouCan benchmarks in IDM/OOD detection when applied to pretrained DeepLabV3+ and SegFormer semantic segmentation models.
arXiv Detail & Related papers (2023-05-16T17:02:57Z)
Deconfounding Scores: Feature Representations for Causal Effect Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation. We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data. In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.