Related papers: Comparing BERT against traditional machine learning text classification

Comparing BERT against traditional machine learning text classification

URL: http://arxiv.org/abs/2005.13012v2
Date: Tue, 12 Jan 2021 15:48:52 GMT
Title: Comparing BERT against traditional machine learning text classification
Authors: Santiago Gonz\'alez-Carvajal and Eduardo C. Garrido-Merch\'an
Abstract summary: The BERT model has arisen as a popular state-of-the-art machine learning model in the recent years. Our purpose of this work is to add empirical evidence to support or refuse the use of BERT as a default on NLP tasks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The BERT model has arisen as a popular state-of-the-art machine learning model in the recent years that is able to cope with multiple NLP tasks such as supervised text classification without human supervision. Its flexibility to cope with any type of corpus delivering great results has make this approach very popular not only in academia but also in the industry. Although, there are lots of different approaches that have been used throughout the years with success. In this work, we first present BERT and include a little review on classical NLP approaches. Then, we empirically test with a suite of experiments dealing different scenarios the behaviour of BERT against the traditional TF-IDF vocabulary fed to machine learning algorithms. Our purpose of this work is to add empirical evidence to support or refuse the use of BERT as a default on NLP tasks. Experiments show the superiority of BERT and its independence of features of the NLP problem such as the language of the text adding empirical evidence to use BERT as a default technique to be used in NLP problems.

Related papers

Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay. Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z)
The Topological BERT: Transforming Attention into Topology for Natural Language Processing [0.0]
This paper introduces a text classifier using topological data analysis. We use BERT's attention maps transformed into attention graphs as the only input to that classifier. The model can solve tasks such as distinguishing spam from ham messages, recognizing whether a sentence is grammatically correct, or evaluating a movie review as negative or positive.
arXiv Detail & Related papers (2022-06-30T11:25:31Z)
Prompt Tuning for Discriminative Pre-trained Language Models [96.04765512463415]
Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks. It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned. We present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem.
arXiv Detail & Related papers (2022-05-23T10:11:50Z)
On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation [12.896747108919968]
In this paper, we introduce a novel multi-scale essay representation for BERT that can be jointly learned. Experiment results show that our approach derives much benefit from joint learning of multi-scale essay representation. Our multi-scale essay representation also generalizes well to CommonLit Readability Prize data set.
arXiv Detail & Related papers (2022-05-08T10:36:54Z)
BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives [0.0]
BERT has revolutionized the NLP field by enabling transfer learning with large language models. This article studies how to better cope with the different embeddings provided by the BERT output layer and the usage of language-specific instead of multilingual models.
arXiv Detail & Related papers (2022-01-10T15:05:05Z)
Comparative Analysis of Machine Learning and Deep Learning Algorithms for Detection of Online Hate Speech [5.543220407902113]
Several attempts have been made to classify hate speech using machine learning but the state-of-the-art models are not robust enough for practical applications. In this paper, we explored various feature engineering techniques ranging from different embeddings to conventional NLP algorithms. We conclude that BERT based embeddings give the most useful features for this problem and have the capacity to be made into a practical robust model.
arXiv Detail & Related papers (2021-04-23T04:19:15Z)
An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research. Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains. In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z)
GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method [29.352569563032056]
We propose a novel method to explicitly inject linguistic knowledge in the form of word embeddings into a pre-trained BERT. Our performance improvements on multiple semantic similarity datasets when injecting dependency-based and counter-fitted embeddings indicate that such information is beneficial and currently missing from the original model.
arXiv Detail & Related papers (2020-10-23T17:00:26Z)
FewJoint: A Few-shot Learning Benchmark for Joint Language Understanding [55.38905499274026]
Few-shot learning is one of the key future steps in machine learning. FewJoint is a novel Few-Shot Learning benchmark for NLP.
arXiv Detail & Related papers (2020-09-17T08:17:12Z)
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains. Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z)
Incorporating BERT into Neural Machine Translation [251.54280200353674]
We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence. We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets.
arXiv Detail & Related papers (2020-02-17T08:13:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.