Related papers: BagBERT: BERT-based bagging-stacking for multi-topic classification

BagBERT: BERT-based bagging-stacking for multi-topic classification

URL: http://arxiv.org/abs/2111.05808v1
Date: Wed, 10 Nov 2021 17:00:36 GMT
Title: BagBERT: BERT-based bagging-stacking for multi-topic classification
Authors: Lo\"ic Rakotoson, Charles Letaillieur, Sylvain Massip and Fr\'ejus Laleye
Abstract summary: We propose an approach that exploits the knowledge of the globally non-optimal weights, usually rejected, to build a rich representation of each label. The aggregation of these weak insights performs better than a classical globally efficient model. Our system obtains an Instance-based F1 of 92.96 and a Label-based micro-F1 of 91.35.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper describes our submission on the COVID-19 literature annotation task at Biocreative VII. We proposed an approach that exploits the knowledge of the globally non-optimal weights, usually rejected, to build a rich representation of each label. Our proposed approach consists of two stages: (1) A bagging of various initializations of the training data that features weakly trained weights, (2) A stacking of heterogeneous vocabulary models based on BERT and RoBERTa Embeddings. The aggregation of these weak insights performs better than a classical globally efficient model. The purpose is the distillation of the richness of knowledge to a simpler and lighter model. Our system obtains an Instance-based F1 of 92.96 and a Label-based micro-F1 of 91.35.

Related papers

KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model [46.95431131609286]
KaLM-Embedding-V2, a versatile and compact embedding model, achieves impressive performance in general-purpose text embedding tasks.<n>Key innovations include: (i) pre-training on large-scale weakly supervised open-source corpora; (ii) fine-tuning on high-quality retrieval and non-retrieval datasets; and (iii) model-soup parameter averaging for robust generalization.
arXiv Detail & Related papers (2025-06-26T01:09:44Z)
High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models. To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence. Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z)
Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels. By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data. The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z)
Multi-objective Representation for Numbers in Clinical Narratives Using CamemBERT-bio [0.9208007322096533]
This research aims to classify numerical values extracted from medical documents across seven physiological categories. We introduce two main innovations: integrating keyword embeddings into the model and adopting a number-agnostic strategy. We show substantial improvements in the effectiveness of CamemBERT-bio, surpassing conventional methods with an F1 score of 0.89.
arXiv Detail & Related papers (2024-05-28T01:15:21Z)
Learning under Label Proportions for Text Classification [13.29710879730948]
We present one of the preliminary NLP works under the challenging setup of Learning from Proportions (LLP) The data is provided in an aggregate form called bags and only the proportion of samples in each class as the ground truth.
arXiv Detail & Related papers (2023-10-18T04:39:25Z)
Generative Calibration for In-context Learning [20.207930451266822]
In this paper, we identify that such a paradox is mainly due to the label shift of the in-context model to the data distribution. With this understanding, we can calibrate the in-context predictive distribution by adjusting the label marginal. We call our approach as generative calibration. We conduct exhaustive experiments with 12 text classification tasks and 12 LLMs scaling from 774M to 33B.
arXiv Detail & Related papers (2023-10-16T10:45:02Z)
Balancing Efficiency vs. Effectiveness and Providing Missing Label Robustness in Multi-Label Stream Classification [3.97048491084787]
We propose a neural network-based approach to high-dimensional multi-label classification. Our model uses a selective concept drift adaptation mechanism that makes it suitable for a non-stationary environment. We adapt our model to an environment with missing labels using a simple yet effective imputation strategy.
arXiv Detail & Related papers (2023-10-01T13:23:37Z)
Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world. We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique. By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z)
Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE [93.98660272309974]
This report briefly describes our submission Vega v1 on the General Language Understanding Evaluation leaderboard. GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference. With our optimized pretraining and fine-tuning strategies, our 1.3 billion model sets new state-of-the-art on 4/9 tasks, achieving the best average score of 91.3.
arXiv Detail & Related papers (2023-02-18T09:26:35Z)
Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model. In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z)
Improving Label Quality by Jointly Modeling Items and Annotators [68.8204255655161]
We propose a fully Bayesian framework for learning ground truth labels from noisy annotators. Our framework ensures scalability by factoring a generative, Bayesian soft clustering model over label distributions into the classic David and Skene joint annotator-data model.
arXiv Detail & Related papers (2021-06-20T02:15:20Z)
UIUC_BioNLP at SemEval-2021 Task 11: A Cascade of Neural Models for Structuring Scholarly NLP Contributions [1.5942130010323128]
We propose a cascade of neural models that performs sentence classification, phrase recognition, and triple extraction. A BERT-CRF model was used to recognize and characterize relevant phrases in contribution sentences. Our system was officially ranked second in Phase 1 evaluation and first in both parts of Phase 2 evaluation.
arXiv Detail & Related papers (2021-05-12T05:24:35Z)
A Hybrid Approach to Measure Semantic Relatedness in Biomedical Concepts [0.0]
We generated concept vectors by encoding concept preferred terms using ELMo, BERT, and Sentence BERT models. We trained all the BERT models using Siamese network on SNLI and STSb datasets to allow the models to learn more semantic information. Injecting ontology knowledge into concept vectors further enhances their quality and contributes to better relatedness scores.
arXiv Detail & Related papers (2021-01-25T16:01:27Z)
Students Need More Attention: BERT-based AttentionModel for Small Data with Application to AutomaticPatient Message Triage [65.7062363323781]
We propose a novel framework based on BioBERT (Bidirectional Representations from Transformers forBiomedical TextMining) We introduce Label Embeddings for Self-Attention in each layer of BERT, which we call LESA-BERT, and (ii) by distilling LESA-BERT to smaller variants, we aim to reduce overfitting and model size when working on small datasets. As an application, our framework is utilized to build a model for patient portal message triage that classifies the urgency of a message into three categories: non-urgent, medium and urgent.
arXiv Detail & Related papers (2020-06-22T03:39:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.