Layer-wise Guided Training for BERT: Learning Incrementally Refined
Document Representations
- URL: http://arxiv.org/abs/2010.05763v1
- Date: Mon, 12 Oct 2020 14:56:22 GMT
- Title: Layer-wise Guided Training for BERT: Learning Incrementally Refined
Document Representations
- Authors: Nikolaos Manginas, Ilias Chalkidis and Prodromos Malakasiotis
- Abstract summary: We propose a novel approach to fine-tune BERT in a structured manner.
Specifically, we focus on Large Scale Multilabel Text Classification (LMTC)
Our approach guides specific BERT layers to predict labels from specific hierarchy levels.
- Score: 11.46458298316499
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although BERT is widely used by the NLP community, little is known about its
inner workings. Several attempts have been made to shed light on certain
aspects of BERT, often with contradicting conclusions. A much raised concern
focuses on BERT's over-parameterization and under-utilization issues. To this
end, we propose o novel approach to fine-tune BERT in a structured manner.
Specifically, we focus on Large Scale Multilabel Text Classification (LMTC)
where documents are assigned with one or more labels from a large predefined
set of hierarchically organized labels. Our approach guides specific BERT
layers to predict labels from specific hierarchy levels. Experimenting with two
LMTC datasets we show that this structured fine-tuning approach not only yields
better classification results but also leads to better parameter utilization.
Related papers
- A Novel Two-Step Fine-Tuning Pipeline for Cold-Start Active Learning in Text Classification Tasks [7.72751543977484]
This work investigates the effectiveness of BERT-based contextual embeddings in active learning (AL) tasks on cold-start scenarios.
Our primary contribution is the proposal of a more robust fine-tuning pipeline - DoTCAL.
Our evaluation contrasts BERT-based embeddings with other prevalent text representation paradigms, including Bag of Words (BoW), Latent Semantic Indexing (LSI) and FastText.
arXiv Detail & Related papers (2024-07-24T13:50:21Z) - Imbalanced Multi-label Classification for Business-related Text with
Moderately Large Label Spaces [0.30458514384586394]
We evaluated four different methods for multi label text classification using a specific imbalanced business dataset.
Fine tuned BERT outperforms the other three methods by a significant margin, achieving high values of accuracy.
These findings highlight the effectiveness of fine tuned BERT for multi label text classification tasks, and suggest that it may be a useful tool for businesses.
arXiv Detail & Related papers (2023-06-12T11:51:50Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples.
A typical alternative is learning from multiple noisy annotators.
This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z) - GUDN A novel guide network for extreme multi-label text classification [12.975260278131078]
This paper constructs a novel guide network (GUDN) to help fine-tune the pre-trained model to instruct classification later.
We also use the raw label semantics to effectively explore the latent space between texts and labels, which can further improve predicted accuracy.
arXiv Detail & Related papers (2022-01-10T07:33:36Z) - A Sentence-level Hierarchical BERT Model for Document Classification
with Limited Labelled Data [5.123298347655086]
This work introduces a long-text-specific model -- the Hierarchical BERT Model (HBM) -- that learns sentence-level features of the text and works well in scenarios with limited data.
Various evaluation experiments have demonstrated that HBM can achieve higher performance in document classification than the previous state-of-the-art methods with only 50 to 200 labelled instances.
arXiv Detail & Related papers (2021-06-12T10:45:24Z) - MATCH: Metadata-Aware Text Classification in A Large Hierarchy [60.59183151617578]
MATCH is an end-to-end framework that leverages both metadata and hierarchy information.
We propose different ways to regularize the parameters and output probability of each child label by its parents.
Experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH.
arXiv Detail & Related papers (2021-02-15T05:23:08Z) - An Empirical Study on Large-Scale Multi-Label Text Classification
Including Few and Zero-Shot Labels [49.036212158261215]
Large-scale Multi-label Text Classification (LMTC) has a wide range of Natural Language Processing (NLP) applications.
Current state-of-the-art LMTC models employ Label-Wise Attention Networks (LWANs)
We show that hierarchical methods based on Probabilistic Label Trees (PLTs) outperform LWANs.
We propose a new state-of-the-art method which combines BERT with LWANs.
arXiv Detail & Related papers (2020-10-04T18:55:47Z) - Incorporating BERT into Neural Machine Translation [251.54280200353674]
We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence.
We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets.
arXiv Detail & Related papers (2020-02-17T08:13:36Z) - BERT's output layer recognizes all hidden layers? Some Intriguing
Phenomena and a simple way to boost BERT [53.63288887672302]
Bidirectional Representations from Transformers (BERT) have achieved tremendous success in many natural language processing (NLP) tasks.
We find that surprisingly the output layer of BERT can reconstruct the input sentence by directly taking each layer of BERT as input.
We propose a quite simple method to boost the performance of BERT.
arXiv Detail & Related papers (2020-01-25T13:35:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.