Students Need More Attention: BERT-based AttentionModel for Small Data
with Application to AutomaticPatient Message Triage
- URL: http://arxiv.org/abs/2006.11991v1
- Date: Mon, 22 Jun 2020 03:39:00 GMT
- Title: Students Need More Attention: BERT-based AttentionModel for Small Data
with Application to AutomaticPatient Message Triage
- Authors: Shijing Si, Rui Wang, Jedrek Wosik, Hao Zhang, David Dov, Guoyin Wang,
Ricardo Henao, and Lawrence Carin
- Abstract summary: We propose a novel framework based on BioBERT (Bidirectional Representations from Transformers forBiomedical TextMining)
We introduce Label Embeddings for Self-Attention in each layer of BERT, which we call LESA-BERT, and (ii) by distilling LESA-BERT to smaller variants, we aim to reduce overfitting and model size when working on small datasets.
As an application, our framework is utilized to build a model for patient portal message triage that classifies the urgency of a message into three categories: non-urgent, medium and urgent.
- Score: 65.7062363323781
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Small and imbalanced datasets commonly seen in healthcare represent a
challenge when training classifiers based on deep learning models. So
motivated, we propose a novel framework based on BioBERT (Bidirectional Encoder
Representations from Transformers forBiomedical TextMining). Specifically, (i)
we introduce Label Embeddings for Self-Attention in each layer of BERT, which
we call LESA-BERT, and (ii) by distilling LESA-BERT to smaller variants, we aim
to reduce overfitting and model size when working on small datasets. As an
application, our framework is utilized to build a model for patient portal
message triage that classifies the urgency of a message into three categories:
non-urgent, medium and urgent. Experiments demonstrate that our approach can
outperform several strong baseline classifiers by a significant margin of 4.3%
in terms of macro F1 score. The code for this project is publicly available at
\url{https://github.com/shijing001/text_classifiers}.
Related papers
- A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation [22.440065488051047]
Key challenge for the widespread application of learning-based models for robotic perception is to significantly reduce the required amount of annotated training data.
We exploit the groundwork paved by visual foundation models to train two lightweight network heads for semantic segmentation and object boundary detection.
We demonstrate that PASTEL significantly outperforms previous methods for label-efficient segmentation even when using fewer annotations.
arXiv Detail & Related papers (2024-05-29T12:23:29Z) - Investigating Large Language Models and Control Mechanisms to Improve Text Readability of Biomedical Abstracts [16.05119302860606]
We investigate the ability of state-of-the-art large language models (LLMs) on the task of biomedical abstract simplification.
The methods applied include domain fine-tuning and prompt-based learning (PBL)
We used a range of automatic evaluation metrics, including BLEU, ROUGE, SARI, and BERTscore, and also conducted human evaluations.
arXiv Detail & Related papers (2023-09-22T22:47:32Z) - Augment and Criticize: Exploring Informative Samples for Semi-Supervised
Monocular 3D Object Detection [64.65563422852568]
We improve the challenging monocular 3D object detection problem with a general semi-supervised framework.
We introduce a novel, simple, yet effective Augment and Criticize' framework that explores abundant informative samples from unlabeled data.
The two new detectors, dubbed 3DSeMo_DLE and 3DSeMo_FLEX, achieve state-of-the-art results with remarkable improvements for over 3.5% AP_3D/BEV (Easy) on KITTI.
arXiv Detail & Related papers (2023-03-20T16:28:15Z) - BERT-Flow-VAE: A Weakly-supervised Model for Multi-Label Text
Classification [0.5156484100374058]
We propose BERT-Flow-VAE (BFV), a Weakly-Supervised Multi-Label Text Classification model that reduces the need for full supervision.
Experimental results on 6 multi-label datasets show that BFV can substantially outperform other baseline WSMLTC models in key metrics.
arXiv Detail & Related papers (2022-10-27T07:18:56Z) - Semi-supervised 3D Object Detection with Proficient Teachers [114.54835359657707]
Dominated point cloud-based 3D object detectors in autonomous driving scenarios rely heavily on the huge amount of accurately labeled samples.
Pseudo-Labeling methodology is commonly used for SSL frameworks, however, the low-quality predictions from the teacher model have seriously limited its performance.
We propose a new Pseudo-Labeling framework for semi-supervised 3D object detection, by enhancing the teacher model to a proficient one with several necessary designs.
arXiv Detail & Related papers (2022-07-26T04:54:03Z) - Which Student is Best? A Comprehensive Knowledge Distillation Exam for
Task-Specific BERT Models [3.303435360096988]
We perform knowledge distillation benchmark from task-specific BERT-base teacher models to various student models.
Our experiment involves 12 datasets grouped in two tasks: text classification and sequence labeling in the Indonesian language.
Our experiments show that, despite the rising popularity of Transformer-based models, using BiLSTM and CNN student models provide the best trade-off between performance and computational resource.
arXiv Detail & Related papers (2022-01-03T10:07:13Z) - Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS)
It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes.
In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image.
We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z) - A Hybrid Approach to Measure Semantic Relatedness in Biomedical Concepts [0.0]
We generated concept vectors by encoding concept preferred terms using ELMo, BERT, and Sentence BERT models.
We trained all the BERT models using Siamese network on SNLI and STSb datasets to allow the models to learn more semantic information.
Injecting ontology knowledge into concept vectors further enhances their quality and contributes to better relatedness scores.
arXiv Detail & Related papers (2021-01-25T16:01:27Z) - Learning Contextual Representations for Semantic Parsing with
Generation-Augmented Pre-Training [86.91380874390778]
We present Generation-Augmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data.
Based on experimental results, neural semantics that leverage GAP MODEL obtain new state-of-the-art results on both SPIDER and CRITERIA-TO-generative benchmarks.
arXiv Detail & Related papers (2020-12-18T15:53:50Z) - KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation [100.79870384880333]
We propose a knowledge-grounded pre-training (KGPT) to generate knowledge-enriched text.
We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness.
Under zero-shot setting, our model achieves over 30 ROUGE-L on WebNLG while all other baselines fail.
arXiv Detail & Related papers (2020-10-05T19:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.