BagBERT: BERT-based bagging-stacking for multi-topic classification
- URL: http://arxiv.org/abs/2111.05808v1
- Date: Wed, 10 Nov 2021 17:00:36 GMT
- Title: BagBERT: BERT-based bagging-stacking for multi-topic classification
- Authors: Lo\"ic Rakotoson, Charles Letaillieur, Sylvain Massip and Fr\'ejus
Laleye
- Abstract summary: We propose an approach that exploits the knowledge of the globally non-optimal weights, usually rejected, to build a rich representation of each label.
The aggregation of these weak insights performs better than a classical globally efficient model.
Our system obtains an Instance-based F1 of 92.96 and a Label-based micro-F1 of 91.35.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper describes our submission on the COVID-19 literature annotation
task at Biocreative VII. We proposed an approach that exploits the knowledge of
the globally non-optimal weights, usually rejected, to build a rich
representation of each label. Our proposed approach consists of two stages: (1)
A bagging of various initializations of the training data that features weakly
trained weights, (2) A stacking of heterogeneous vocabulary models based on
BERT and RoBERTa Embeddings. The aggregation of these weak insights performs
better than a classical globally efficient model. The purpose is the
distillation of the richness of knowledge to a simpler and lighter model. Our
system obtains an Instance-based F1 of 92.96 and a Label-based micro-F1 of
91.35.
Related papers
- High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - Multi-objective Representation for Numbers in Clinical Narratives Using CamemBERT-bio [0.9208007322096533]
This research aims to classify numerical values extracted from medical documents across seven physiological categories.
We introduce two main innovations: integrating keyword embeddings into the model and adopting a number-agnostic strategy.
We show substantial improvements in the effectiveness of CamemBERT-bio, surpassing conventional methods with an F1 score of 0.89.
arXiv Detail & Related papers (2024-05-28T01:15:21Z) - Learning under Label Proportions for Text Classification [13.29710879730948]
We present one of the preliminary NLP works under the challenging setup of Learning from Proportions (LLP)
The data is provided in an aggregate form called bags and only the proportion of samples in each class as the ground truth.
arXiv Detail & Related papers (2023-10-18T04:39:25Z) - Generative Calibration for In-context Learning [20.207930451266822]
In this paper, we identify that such a paradox is mainly due to the label shift of the in-context model to the data distribution.
With this understanding, we can calibrate the in-context predictive distribution by adjusting the label marginal.
We call our approach as generative calibration. We conduct exhaustive experiments with 12 text classification tasks and 12 LLMs scaling from 774M to 33B.
arXiv Detail & Related papers (2023-10-16T10:45:02Z) - Balancing Efficiency vs. Effectiveness and Providing Missing Label
Robustness in Multi-Label Stream Classification [3.97048491084787]
We propose a neural network-based approach to high-dimensional multi-label classification.
Our model uses a selective concept drift adaptation mechanism that makes it suitable for a non-stationary environment.
We adapt our model to an environment with missing labels using a simple yet effective imputation strategy.
arXiv Detail & Related papers (2023-10-01T13:23:37Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Bag of Tricks for Effective Language Model Pretraining and Downstream
Adaptation: A Case Study on GLUE [93.98660272309974]
This report briefly describes our submission Vega v1 on the General Language Understanding Evaluation leaderboard.
GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference.
With our optimized pretraining and fine-tuning strategies, our 1.3 billion model sets new state-of-the-art on 4/9 tasks, achieving the best average score of 91.3.
arXiv Detail & Related papers (2023-02-18T09:26:35Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - Improving Label Quality by Jointly Modeling Items and Annotators [68.8204255655161]
We propose a fully Bayesian framework for learning ground truth labels from noisy annotators.
Our framework ensures scalability by factoring a generative, Bayesian soft clustering model over label distributions into the classic David and Skene joint annotator-data model.
arXiv Detail & Related papers (2021-06-20T02:15:20Z) - UIUC_BioNLP at SemEval-2021 Task 11: A Cascade of Neural Models for
Structuring Scholarly NLP Contributions [1.5942130010323128]
We propose a cascade of neural models that performs sentence classification, phrase recognition, and triple extraction.
A BERT-CRF model was used to recognize and characterize relevant phrases in contribution sentences.
Our system was officially ranked second in Phase 1 evaluation and first in both parts of Phase 2 evaluation.
arXiv Detail & Related papers (2021-05-12T05:24:35Z) - Students Need More Attention: BERT-based AttentionModel for Small Data
with Application to AutomaticPatient Message Triage [65.7062363323781]
We propose a novel framework based on BioBERT (Bidirectional Representations from Transformers forBiomedical TextMining)
We introduce Label Embeddings for Self-Attention in each layer of BERT, which we call LESA-BERT, and (ii) by distilling LESA-BERT to smaller variants, we aim to reduce overfitting and model size when working on small datasets.
As an application, our framework is utilized to build a model for patient portal message triage that classifies the urgency of a message into three categories: non-urgent, medium and urgent.
arXiv Detail & Related papers (2020-06-22T03:39:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.