Related papers: Semantic F1 Scores: Fair Evaluation Under Fuzzy Class Boundaries

Semantic F1 Scores: Fair Evaluation Under Fuzzy Class Boundaries

URL: http://arxiv.org/abs/2509.21633v1
Date: Thu, 25 Sep 2025 21:48:48 GMT
Title: Semantic F1 Scores: Fair Evaluation Under Fuzzy Class Boundaries
Authors: Georgios Chochlakis, Jackson Trager, Vedant Jhaveri, Nikhil Ravichandran, Alexandros Potamianos, Shrikanth Narayanan,
Abstract summary: We propose Semantic F1 Scores, novel evaluation metrics for subjective or fuzzy multi-label classification.<n>By granting partial credit for semantically related but nonidentical labels, Semantic F1 better reflects the realities of domains marked by human disagreement or fuzzy category boundaries.
Score: 65.89202599399252
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose Semantic F1 Scores, novel evaluation metrics for subjective or fuzzy multi-label classification that quantify semantic relatedness between predicted and gold labels. Unlike the conventional F1 metrics that treat semantically related predictions as complete failures, Semantic F1 incorporates a label similarity matrix to compute soft precision-like and recall-like scores, from which the Semantic F1 scores are derived. Unlike existing similarity-based metrics, our novel two-step precision-recall formulation enables the comparison of label sets of arbitrary sizes without discarding labels or forcing matches between dissimilar labels. By granting partial credit for semantically related but nonidentical labels, Semantic F1 better reflects the realities of domains marked by human disagreement or fuzzy category boundaries. In this way, it provides fairer evaluations: it recognizes that categories overlap, that annotators disagree, and that downstream decisions based on similar predictions lead to similar outcomes. Through theoretical justification and extensive empirical validation on synthetic and real data, we show that Semantic F1 demonstrates greater interpretability and ecological validity. Because it requires only a domain-appropriate similarity matrix, which is robust to misspecification, and not a rigid ontology, it is applicable across tasks and modalities.

Related papers

Semantic-KG: Using Knowledge Graphs to Construct Benchmarks for Measuring Semantic Similarity [42.873412319680035]
This paper introduces a novel method for generating benchmarks to evaluate semantic similarity methods for Large Language Models outputs.<n>We generate benchmark datasets in four different domains (general knowledge, biomedicine, finance, biology)<n>We observe that the sub-type of semantic variation, as well as the domain of the benchmark impact the performance of semantic similarity methods.
arXiv Detail & Related papers (2025-11-25T05:07:08Z)
Weakly-Supervised Contrastive Learning for Imprecise Class Labels [50.57424331797865]
We introduce the concept of continuous semantic similarity'' to define positive and negative pairs.<n>We propose a graph-theoretic framework for weakly-supervised contrastive learning.<n>Our framework is highly versatile and can be applied to many weakly-supervised learning scenarios.
arXiv Detail & Related papers (2025-05-28T06:50:40Z)
sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification [42.37189502220329]
We propose a loss function, sigmoidF1, to account for the complexity of multilabel classification evaluation. We show that sigmoidF1 outperforms other loss functions on four datasets and several metrics.
arXiv Detail & Related papers (2021-08-24T08:11:33Z)
A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning [111.05365744744437]
Unsupervised contrastive learning labels crops of the same image as positives, and other image crops as negatives. In this work, we first prove that for contrastive learning, inaccurate label assignment heavily impairs its generalization for semantic instance discrimination. Inspired by this theory, we propose a novel self-labeling refinement approach for contrastive learning.
arXiv Detail & Related papers (2021-06-28T14:24:52Z)
Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced Semi-Supervised Learning [80.05441565830726]
This paper addresses imbalanced semi-supervised learning, where heavily biased pseudo-labels can harm the model performance. We propose a general pseudo-labeling framework to address the bias motivated by this observation. We term the novel pseudo-labeling framework for imbalanced SSL as Distribution-Aware Semantics-Oriented (DASO) Pseudo-label.
arXiv Detail & Related papers (2021-06-10T11:58:25Z)
Debiased Contrastive Learning [64.98602526764599]
We develop a debiased contrastive objective that corrects for the sampling of same-label datapoints. Empirically, the proposed objective consistently outperforms the state-of-the-art for representation learning in vision, language, and reinforcement learning benchmarks.
arXiv Detail & Related papers (2020-07-01T04:25:24Z)
Rectifying Pseudo Label Learning via Uncertainty Estimation for Domain Adaptive Semantic Segmentation [49.295165476818866]
This paper focuses on the unsupervised domain adaptation of transferring the knowledge from the source domain to the target domain in the context of semantic segmentation. Existing approaches usually regard the pseudo label as the ground truth to fully exploit the unlabeled target-domain data. This paper proposes to explicitly estimate the prediction uncertainty during training to rectify the pseudo label learning.
arXiv Detail & Related papers (2020-03-08T12:37:19Z)
A Quadruplet Loss for Enforcing Semantically Coherent Embeddings in Multi-output Classification Problems [5.972927416266617]
This paper describes one objective function for learning semantically coherent feature embeddings in multi-output classification problems. We consider the problems of identity retrieval and soft biometrics labelling in visual surveillance environments.
arXiv Detail & Related papers (2020-02-26T17:18:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.