Related papers: Linear Classifier: An Often-Forgotten Baseline for Text Classification

Linear Classifier: An Often-Forgotten Baseline for Text Classification

URL: http://arxiv.org/abs/2306.07111v1
Date: Mon, 12 Jun 2023 13:39:54 GMT
Title: Linear Classifier: An Often-Forgotten Baseline for Text Classification
Authors: Yu-Chen Lin, Si-An Chen, Jie-Jyun Liu, and Chih-Jen Lin
Abstract summary: We argue the importance of running a simple baseline like linear classifiers on bag-of-words features along with advanced methods. advanced models such as BERT may only achieve the best results if properly applied.
Score: 12.792276278777532
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale pre-trained language models such as BERT are popular solutions for text classification. Due to the superior performance of these advanced methods, nowadays, people often directly train them for a few epochs and deploy the obtained model. In this opinion paper, we point out that this way may only sometimes get satisfactory results. We argue the importance of running a simple baseline like linear classifiers on bag-of-words features along with advanced methods. First, for many text data, linear methods show competitive performance, high efficiency, and robustness. Second, advanced models such as BERT may only achieve the best results if properly applied. Simple baselines help to confirm whether the results of advanced models are acceptable. Our experimental results fully support these points.

Related papers

READ: Reinforcement-based Adversarial Learning for Text Classification with Limited Labeled Data [7.152603583363887]
Pre-trained transformer models such as BERT have shown massive gains across many text classification tasks. This paper proposes a method that encapsulates reinforcement learning-based text generation and semi-supervised adversarial learning approaches. Our method READ, Reinforcement-based Adversarial learning, utilizes an unlabeled dataset to generate diverse synthetic text through reinforcement learning.
arXiv Detail & Related papers (2025-01-14T11:39:55Z)
Towards Efficient Active Learning in NLP via Pretrained Representations [1.90365714903665]
Fine-tuning Large Language Models (LLMs) is now a common approach for text classification in a wide range of applications. We drastically expedite this process by using pretrained representations of LLMs within the active learning loop. Our strategy yields similar performance to fine-tuning all the way through the active learning loop but is orders of magnitude less computationally expensive.
arXiv Detail & Related papers (2024-02-23T21:28:59Z)
Understanding prompt engineering may not require rethinking generalization [56.38207873589642]
We show that the discrete nature of prompts, combined with a PAC-Bayes prior given by a language model, results in generalization bounds that are remarkably tight by the standards of the literature. This work provides a possible justification for the widespread practice of prompt engineering.
arXiv Detail & Related papers (2023-10-06T00:52:48Z)
Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models. We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models. Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z)
Bag of Tricks for Training Data Extraction from Language Models [98.40637430115204]
We investigate and benchmark tricks for improving training data extraction using a publicly available dataset. The experimental results show that several previously overlooked tricks can be crucial to the success of training data extraction.
arXiv Detail & Related papers (2023-02-09T06:46:42Z)
Are We Really Making Much Progress in Text Classification? A Comparative Review [2.579878570919875]
This study reviews and compares methods for single-label and multi-label text classification. Results reveal that all recently proposed graph-based and hierarchy-based methods fail to outperform pre-trained language models.
arXiv Detail & Related papers (2022-04-08T09:28:20Z)
Finding the Winning Ticket of BERT for Binary Text Classification via Adaptive Layer Truncation before Fine-tuning [7.797987384189306]
We construct a series of BERT-based models with different size and compare their predictions on 8 binary classification tasks. The results show there truly exist smaller sub-networks performing better than the full model.
arXiv Detail & Related papers (2021-11-22T02:22:47Z)
ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text Classification Models [0.0]
Deep learning approaches based on CNN, LSTM, and Transformers have been the de facto approach for text classification. We show that these systems are over-reliant on the important words present in the text that are useful for classification.
arXiv Detail & Related papers (2021-01-30T15:18:35Z)
A Comparison of LSTM and BERT for Small Corpus [0.0]
Recent advancements in the NLP field showed that transfer learning helps with achieving state-of-the-art results for new tasks by tuning pre-trained models instead of starting from scratch. In this paper we focus on a real-life scenario that scientists in academia and industry face frequently: given a small dataset, can we use a large pre-trained model like BERT and get better results than simple models? Our experimental results show that bidirectional LSTM models can achieve significantly higher results than a BERT model for a small dataset and these simple models get trained in much less time than tuning the pre-trained counterparts.
arXiv Detail & Related papers (2020-09-11T14:01:14Z)
Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results. Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples. Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z)
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words" Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.