Related papers: Deep or Simple Models for Semantic Tagging? It Depends on your Data [Experiments]

Deep or Simple Models for Semantic Tagging? It Depends on your Data [Experiments]

URL: http://arxiv.org/abs/2007.05651v2
Date: Thu, 8 Oct 2020 22:45:08 GMT
Title: Deep or Simple Models for Semantic Tagging? It Depends on your Data [Experiments]
Authors: Jinfeng Li, Yuliang Li, Xiaolan Wang, Wang-Chiew Tan
Abstract summary: We show that the size, the label ratio, and the label cleanliness of a dataset significantly impact the quality of semantic tagging. Simple models achieve similar tagging quality to deep models on large datasets, but the runtime of simple models is much shorter.
Score: 26.48209520599515
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic tagging, which has extensive applications in text mining, predicts whether a given piece of text conveys the meaning of a given semantic tag. The problem of semantic tagging is largely solved with supervised learning and today, deep learning models are widely perceived to be better for semantic tagging. However, there is no comprehensive study supporting the popular belief. Practitioners often have to train different types of models for each semantic tagging task to identify the best model. This process is both expensive and inefficient. We embark on a systematic study to investigate the following question: Are deep models the best performing model for all semantic tagging tasks? To answer this question, we compare deep models against "simple models" over datasets with varying characteristics. Specifically, we select three prevalent deep models (i.e. CNN, LSTM, and BERT) and two simple models (i.e. LR and SVM), and compare their performance on the semantic tagging task over 21 datasets. Results show that the size, the label ratio, and the label cleanliness of a dataset significantly impact the quality of semantic tagging. Simple models achieve similar tagging quality to deep models on large datasets, but the runtime of simple models is much shorter. Moreover, simple models can achieve better tagging quality than deep models when targeting datasets show worse label cleanliness and/or more severe imbalance. Based on these findings, our study can systematically guide practitioners in selecting the right learning model for their semantic tagging task.

Related papers

GRAM: A Generative Foundation Reward Model for Reward Generalization [48.63394690265176]
We develop a generative reward model that is first trained via large-scale unsupervised learning and then fine-tuned via supervised learning.<n>This model generalizes well across several tasks, including response ranking, reinforcement learning from human feedback, and task adaptation with fine-tuning.
arXiv Detail & Related papers (2025-06-17T04:34:27Z)
Enabling Small Models for Zero-Shot Classification through Model Label Learning [50.68074833512999]
We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities. Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL.
arXiv Detail & Related papers (2024-08-21T09:08:26Z)
CerberusDet: Unified Multi-Dataset Object Detection [0.0]
CerberusDet is a framework with a multi-headed model designed for handling multiple object detection tasks. Proposed model is built on the YOLO architecture and efficiently shares visual features from both backbone and neck components. CerberusDet achieved state-of-the-art results with 36% less inference time.
arXiv Detail & Related papers (2024-07-17T15:00:35Z)
When are Foundation Models Effective? Understanding the Suitability for Pixel-Level Classification Using Multispectral Imagery [23.464350453312584]
Foundation models, i.e. very large deep learning models, have demonstrated impressive performances in various language and vision tasks. Are foundation models always a suitable choice for different remote sensing tasks, and when or when not? This work aims to enhance the understanding of the status and suitability of foundation models for pixel-level classification.
arXiv Detail & Related papers (2024-04-17T23:30:48Z)
Comparing effectiveness of regularization methods on text classification: Simple and complex model in data shortage situation [0.8848340429852071]
We study the regularization methods' effects on various classification models when only a few labeled data are available. We compare a simple word embedding-based model, which is simple but effective, with complex models. We evaluate the regularization effects on four text classification datasets.
arXiv Detail & Related papers (2024-02-27T07:26:16Z)
Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset. We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding. Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z)
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks [105.77733287326308]
UnIVAL model goes beyond two modalities and unifies text, images, video, and audio into a single model. Our model is efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning. Thanks to the unified model, we propose a novel study on multimodal model merging via weight generalization.
arXiv Detail & Related papers (2023-07-30T09:48:36Z)
Understanding How Model Size Affects Few-shot Instruction Prompting [0.0]
We investigate how the model size affects the model's ability to discriminate a word's meaning in a given context. We introduce a dataset called DeltaWords, which evaluates a model's ability to follow instructions. We show a weak inverse scaling trend, where task accuracy degrades as model size increase.
arXiv Detail & Related papers (2022-12-04T19:59:52Z)
Model Extraction Attack against Self-supervised Speech Models [52.81330435990717]
Self-supervised learning (SSL) speech models generate meaningful representations of given clips. Model extraction attack (MEA) often refers to an adversary stealing the functionality of the victim model with only query access. We study the MEA problem against SSL speech model with a small number of queries.
arXiv Detail & Related papers (2022-11-29T09:28:05Z)
Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models. Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers. We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z)
Multi-Task Self-Training for Learning General Representations [97.01728635294879]
Multi-task self-training (MuST) harnesses the knowledge in independent specialized teacher models to train a single general student model. MuST is scalable with unlabeled or partially labeled datasets and outperforms both specialized supervised models and self-supervised models when training on large scale datasets.
arXiv Detail & Related papers (2021-08-25T17:20:50Z)
A Comparison of LSTM and BERT for Small Corpus [0.0]
Recent advancements in the NLP field showed that transfer learning helps with achieving state-of-the-art results for new tasks by tuning pre-trained models instead of starting from scratch. In this paper we focus on a real-life scenario that scientists in academia and industry face frequently: given a small dataset, can we use a large pre-trained model like BERT and get better results than simple models? Our experimental results show that bidirectional LSTM models can achieve significantly higher results than a BERT model for a small dataset and these simple models get trained in much less time than tuning the pre-trained counterparts.
arXiv Detail & Related papers (2020-09-11T14:01:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.