Preserving Empirical Probabilities in BERT for Small-sample Clinical Entity Recognition
- URL: http://arxiv.org/abs/2409.03238v1
- Date: Thu, 5 Sep 2024 04:38:49 GMT
- Title: Preserving Empirical Probabilities in BERT for Small-sample Clinical Entity Recognition
- Authors: Abdul Rehman, Jian Jun Zhang, Xiaosong Yang,
- Abstract summary: This paper explores the effects of unbalanced entity labels of the BERT-based pre-trained model.
We analyze the different mechanisms of loss calculation and loss propagation for the task of token classification on randomized datasets.
- Score: 5.466034990848432
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Named Entity Recognition (NER) encounters the challenge of unbalanced labels, where certain entity types are overrepresented while others are underrepresented in real-world datasets. This imbalance can lead to biased models that perform poorly on minority entity classes, impeding accurate and equitable entity recognition. This paper explores the effects of unbalanced entity labels of the BERT-based pre-trained model. We analyze the different mechanisms of loss calculation and loss propagation for the task of token classification on randomized datasets. Then we propose ways to improve the token classification for the highly imbalanced task of clinical entity recognition.
Related papers
- Energy Score-based Pseudo-Label Filtering and Adaptive Loss for Imbalanced Semi-supervised SAR target recognition [1.2035771704626825]
Existing semi-supervised SAR ATR algorithms show low recognition accuracy in the case of class imbalance.
This work offers a non-balanced semi-supervised SAR target recognition approach using dynamic energy scores and adaptive loss.
arXiv Detail & Related papers (2024-11-06T14:45:16Z) - Pedestrian Attribute Recognition as Label-balanced Multi-label Learning [12.605514698358165]
We propose a novel framework that successfully decouples label-balanced data re-sampling from the curse of attributes co-occurrence.
Our work achieves best accuracy on various popular benchmarks, and importantly, with minimal computational budget.
arXiv Detail & Related papers (2024-05-08T07:27:15Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Exploring Vacant Classes in Label-Skewed Federated Learning [113.65301899666645]
Label skews, characterized by disparities in local label distribution across clients, pose a significant challenge in federated learning.
This paper introduces FedVLS, a novel approach to label-skewed federated learning that integrates vacant-class distillation and logit suppression simultaneously.
arXiv Detail & Related papers (2024-01-04T16:06:31Z) - Neighbour Consistency Guided Pseudo-Label Refinement for Unsupervised
Person Re-Identification [80.98291772215154]
Unsupervised person re-identification (ReID) aims at learning discriminative identity features for person retrieval without any annotations.
Recent advances accomplish this task by leveraging clustering-based pseudo labels.
We propose a Neighbour Consistency guided Pseudo Label Refinement framework.
arXiv Detail & Related papers (2022-11-30T09:39:57Z) - Robust Neural Network Classification via Double Regularization [2.41710192205034]
We propose a novel double regularization of the neural network training loss that combines a penalty on the complexity of the classification model and an optimal reweighting of training observations.
We demonstrate DRFit, for neural net classification of (i) MNIST and (ii) CIFAR-10, in both cases with simulated mislabeling.
arXiv Detail & Related papers (2021-12-15T13:19:20Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z) - Data Augmentation Imbalance For Imbalanced Attribute Classification [60.71438625139922]
We propose a new re-sampling algorithm called: data augmentation imbalance (DAI) to explicitly enhance the ability to discriminate the fewer attributes.
Our DAI algorithm achieves state-of-the-art results, based on pedestrian attribute datasets.
arXiv Detail & Related papers (2020-04-19T20:43:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.