Cost-Quality Adaptive Active Learning for Chinese Clinical Named Entity
Recognition
- URL: http://arxiv.org/abs/2008.12548v1
- Date: Fri, 28 Aug 2020 09:27:43 GMT
- Title: Cost-Quality Adaptive Active Learning for Chinese Clinical Named Entity
Recognition
- Authors: Tingting Cai, Yangming Zhou, Hong Zheng
- Abstract summary: We propose a Cost-Quality Adaptive Active Learning (CQAAL) approach for Clinical Named Entity Recognition (CNER) in Chinese EHRs.
CQAAL selects cost-effective instance-labeler pairs to achieve better annotation quality with lower costs in an adaptive manner.
- Score: 4.227856561940623
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clinical Named Entity Recognition (CNER) aims to automatically identity
clinical terminologies in Electronic Health Records (EHRs), which is a
fundamental and crucial step for clinical research. To train a high-performance
model for CNER, it usually requires a large number of EHRs with high-quality
labels. However, labeling EHRs, especially Chinese EHRs, is time-consuming and
expensive. One effective solution to this is active learning, where a model
asks labelers to annotate data which the model is uncertain of. Conventional
active learning assumes a single labeler that always replies noiseless answers
to queried labels. However, in real settings, multiple labelers provide diverse
quality of annotation with varied costs and labelers with low overall
annotation quality can still assign correct labels for some specific instances.
In this paper, we propose a Cost-Quality Adaptive Active Learning (CQAAL)
approach for CNER in Chinese EHRs, which maintains a balance between the
annotation quality, labeling costs, and the informativeness of selected
instances. Specifically, CQAAL selects cost-effective instance-labeler pairs to
achieve better annotation quality with lower costs in an adaptive manner.
Computational results on the CCKS-2017 Task 2 benchmark dataset demonstrate the
superiority and effectiveness of the proposed CQAAL.
Related papers
- Label Set Optimization via Activation Distribution Kurtosis for Zero-shot Classification with Generative Models [10.699636123243138]
In-context learning (ICL) performance is sensitive to the prompt design, yet the impact of class label options in zero-shot classification has been largely overlooked.
This study presents the first comprehensive empirical study investigating how label option influences zero-shot ICL classification performance.
arXiv Detail & Related papers (2024-10-24T22:59:23Z) - Improved Adaptive Algorithm for Scalable Active Learning with Weak
Labeler [89.27610526884496]
Weak Labeler Active Cover (WL-AC) is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy.
We show its effectiveness on the corrupted-MNIST dataset by significantly reducing the number of labels while keeping the same accuracy as in passive learning.
arXiv Detail & Related papers (2022-11-04T02:52:54Z) - Eliciting and Learning with Soft Labels from Every Annotator [31.10635260890126]
We focus on efficiently eliciting soft labels from individual annotators.
We demonstrate that learning with our labels achieves comparable model performance to prior approaches.
arXiv Detail & Related papers (2022-07-02T12:03:00Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Active label cleaning: Improving dataset quality under resource
constraints [13.716577886649018]
Imperfections in data annotation, known as label noise, are detrimental to the training of machine learning models.
This work advocates for a data-driven approach to prioritising samples for re-annotation.
We rank instances according to estimated label correctness and labelling difficulty of each sample, and introduce a simulation framework to evaluate relabelling efficacy.
arXiv Detail & Related papers (2021-09-01T19:03:57Z) - Rethinking Pseudo Labels for Semi-Supervised Object Detection [84.697097472401]
We introduce certainty-aware pseudo labels tailored for object detection.
We dynamically adjust the thresholds used to generate pseudo labels and reweight loss functions for each category to alleviate the class imbalance problem.
Our approach improves supervised baselines by up to 10% AP using only 1-10% labeled data from COCO.
arXiv Detail & Related papers (2021-06-01T01:32:03Z) - Cost-Accuracy Aware Adaptive Labeling for Active Learning [9.761953860259942]
In many real settings, different labelers have different labeling costs and can yield different labeling accuracies.
We propose a new algorithm for selecting instances, labelers and their corresponding costs and labeling accuracies.
Our proposed algorithm demonstrates state-of-the-art performance on five UCI and a real crowdsourcing dataset.
arXiv Detail & Related papers (2021-05-24T17:21:00Z) - Active learning for medical code assignment [55.99831806138029]
We demonstrate the effectiveness of Active Learning (AL) in multi-label text classification in the clinical domain.
We apply a set of well-known AL methods to help automatically assign ICD-9 codes on the MIMIC-III dataset.
Our results show that the selection of informative instances provides satisfactory classification with a significantly reduced training set.
arXiv Detail & Related papers (2021-04-12T18:11:17Z) - Semi-Supervised Speech Recognition via Graph-based Temporal
Classification [59.58318952000571]
Semi-supervised learning has demonstrated promising results in automatic speech recognition by self-training.
The effectiveness of this approach largely relies on the pseudo-label accuracy.
Alternative ASR hypotheses of an N-best list can provide more accurate labels for an unlabeled speech utterance.
arXiv Detail & Related papers (2020-10-29T14:56:56Z) - Active Learning for Noisy Data Streams Using Weak and Strong Labelers [3.9370369973510746]
We consider a novel weak and strong labeler problem inspired by humans natural ability for labeling.
We propose an on-line active learning algorithm that consists of four steps: filtering, adding diversity, informative sample selection, and labeler selection.
We derive a decision function that measures the information gain by combining the informativeness of individual samples and model confidence.
arXiv Detail & Related papers (2020-10-27T09:18:35Z) - Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models.
Self-training serves as an effective mechanism to learn from large amounts of unlabeled data.
meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.