Linear Classifier: An Often-Forgotten Baseline for Text Classification
- URL: http://arxiv.org/abs/2306.07111v1
- Date: Mon, 12 Jun 2023 13:39:54 GMT
- Title: Linear Classifier: An Often-Forgotten Baseline for Text Classification
- Authors: Yu-Chen Lin, Si-An Chen, Jie-Jyun Liu, and Chih-Jen Lin
- Abstract summary: We argue the importance of running a simple baseline like linear classifiers on bag-of-words features along with advanced methods.
advanced models such as BERT may only achieve the best results if properly applied.
- Score: 12.792276278777532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale pre-trained language models such as BERT are popular solutions
for text classification. Due to the superior performance of these advanced
methods, nowadays, people often directly train them for a few epochs and deploy
the obtained model. In this opinion paper, we point out that this way may only
sometimes get satisfactory results. We argue the importance of running a simple
baseline like linear classifiers on bag-of-words features along with advanced
methods. First, for many text data, linear methods show competitive
performance, high efficiency, and robustness. Second, advanced models such as
BERT may only achieve the best results if properly applied. Simple baselines
help to confirm whether the results of advanced models are acceptable. Our
experimental results fully support these points.
Related papers
- READ: Reinforcement-based Adversarial Learning for Text Classification with Limited Labeled Data [7.152603583363887]
Pre-trained transformer models such as BERT have shown massive gains across many text classification tasks.
This paper proposes a method that encapsulates reinforcement learning-based text generation and semi-supervised adversarial learning approaches.
Our method READ, Reinforcement-based Adversarial learning, utilizes an unlabeled dataset to generate diverse synthetic text through reinforcement learning.
arXiv Detail & Related papers (2025-01-14T11:39:55Z) - Towards Efficient Active Learning in NLP via Pretrained Representations [1.90365714903665]
Fine-tuning Large Language Models (LLMs) is now a common approach for text classification in a wide range of applications.
We drastically expedite this process by using pretrained representations of LLMs within the active learning loop.
Our strategy yields similar performance to fine-tuning all the way through the active learning loop but is orders of magnitude less computationally expensive.
arXiv Detail & Related papers (2024-02-23T21:28:59Z) - Understanding prompt engineering may not require rethinking
generalization [56.38207873589642]
We show that the discrete nature of prompts, combined with a PAC-Bayes prior given by a language model, results in generalization bounds that are remarkably tight by the standards of the literature.
This work provides a possible justification for the widespread practice of prompt engineering.
arXiv Detail & Related papers (2023-10-06T00:52:48Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Bag of Tricks for Training Data Extraction from Language Models [98.40637430115204]
We investigate and benchmark tricks for improving training data extraction using a publicly available dataset.
The experimental results show that several previously overlooked tricks can be crucial to the success of training data extraction.
arXiv Detail & Related papers (2023-02-09T06:46:42Z) - Are We Really Making Much Progress in Text Classification? A Comparative Review [5.33235750734179]
We analyze various methods for single-label and multi-label text classification across well-known datasets.
We highlight the superiority of discriminative language models like BERT over generative models for supervised tasks.
arXiv Detail & Related papers (2022-04-08T09:28:20Z) - Finding the Winning Ticket of BERT for Binary Text Classification via
Adaptive Layer Truncation before Fine-tuning [7.797987384189306]
We construct a series of BERT-based models with different size and compare their predictions on 8 binary classification tasks.
The results show there truly exist smaller sub-networks performing better than the full model.
arXiv Detail & Related papers (2021-11-22T02:22:47Z) - ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text
Classification Models [0.0]
Deep learning approaches based on CNN, LSTM, and Transformers have been the de facto approach for text classification.
We show that these systems are over-reliant on the important words present in the text that are useful for classification.
arXiv Detail & Related papers (2021-01-30T15:18:35Z) - Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results.
Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples.
Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.