A multi-model-based deep learning framework for short text multiclass
classification with the imbalanced and extremely small data set
- URL: http://arxiv.org/abs/2206.12027v1
- Date: Fri, 24 Jun 2022 00:51:02 GMT
- Title: A multi-model-based deep learning framework for short text multiclass
classification with the imbalanced and extremely small data set
- Authors: Jiajun Tong, Zhixiao Wang, Xiaobin Rui
- Abstract summary: This paper proposes a multimodel-based deep learning framework for short-text multiclass classification with an imbalanced and extremely small data set.
It retains the state-of-the-art baseline performance in terms of precision, recall, accuracy, and F1 score.
- Score: 0.6875312133832077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text classification plays an important role in many practical applications.
In the real world, there are extremely small datasets. Most existing methods
adopt pre-trained neural network models to handle this kind of dataset.
However, these methods are either difficult to deploy on mobile devices because
of their large output size or cannot fully extract the deep semantic
information between phrases and clauses. This paper proposes a multimodel-based
deep learning framework for short-text multiclass classification with an
imbalanced and extremely small data set. Our framework mainly includes five
layers: The encoder layer uses DISTILBERT to obtain context-sensitive dynamic
word vectors that are difficult to represent in traditional feature engineering
methods. Since the transformer part of this layer is distilled, our framework
is compressed. Then, we use the next two layers to extract deep semantic
information. The output of the encoder layer is sent to a bidirectional LSTM
network, and the feature matrix is extracted hierarchically through the LSTM at
the word and sentence level to obtain the fine-grained semantic representation.
After that, the max-pooling layer converts the feature matrix into a
lower-dimensional matrix, preserving only the obvious features. Finally, the
feature matrix is taken as the input of a fully connected softmax layer, which
contains a function that can convert the predicted linear vector into the
output value as the probability of the text in each classification. Extensive
experiments on two public benchmarks demonstrate the effectiveness of our
proposed approach on an extremely small data set. It retains the
state-of-the-art baseline performance in terms of precision, recall, accuracy,
and F1 score, and through the model size, training time, and convergence epoch,
we can conclude that our method can be deployed faster and lighter on mobile
devices.
Related papers
- LiteNeXt: A Novel Lightweight ConvMixer-based Model with Self-embedding Representation Parallel for Medical Image Segmentation [2.0901574458380403]
We propose a new lightweight but efficient model, namely LiteNeXt, for medical image segmentation.
LiteNeXt is trained from scratch with small amount of parameters (0.71M) and Giga Floating Point Operations Per Second (0.42).
arXiv Detail & Related papers (2024-04-04T01:59:19Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - Combining Primal and Dual Representations in Deep Restricted Kernel
Machines Classifiers [17.031744210104556]
We propose a new method for DRKM classification coupling the objectives of KPCA and classification levels.
The classification level can be formulated as an LSSVM or as a primal feature map, combining depth in terms of levels and layers.
We show that our developed algorithm can effectively learn from small datasets, while using less memory than the convolutional neural network (CNN) with high-dimensional data.
arXiv Detail & Related papers (2023-06-12T10:39:57Z) - Large-Margin Representation Learning for Texture Classification [67.94823375350433]
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification.
The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
arXiv Detail & Related papers (2022-06-17T04:07:45Z) - Online Deep Learning based on Auto-Encoder [4.128388784932455]
We propose a two-phase Online Deep Learning based on Auto-Encoder (ODLAE)
Based on auto-encoder, considering reconstruction loss, we extract abstract hierarchical latent representations of instances.
We devise two fusion strategies: the output-level fusion strategy, which is obtained by fusing the classification results of each hidden layer; and feature-level fusion strategy, which is leveraged self-attention mechanism to fusion every hidden layer output.
arXiv Detail & Related papers (2022-01-19T02:14:57Z) - Hierarchical Neural Network Approaches for Long Document Classification [3.6700088931938835]
We employ pre-trained Universal Sentence (USE) and Bidirectional Representations from Transformers (BERT) in a hierarchical setup to capture better representations efficiently.
Our proposed models are conceptually simple where we divide the input data into chunks and then pass this through base models of BERT and USE.
We show that USE + CNN/LSTM performs better than its stand-alone baseline. Whereas the BERT + CNN/LSTM performs on par with its stand-alone counterpart.
arXiv Detail & Related papers (2022-01-18T07:17:40Z) - Benchmarking Multimodal AutoML for Tabular Data with Text Fields [83.43249184357053]
We assemble 18 multimodal data tables that each contain some text fields.
Our benchmark enables researchers to evaluate their own methods for supervised learning with numeric, categorical, and text features.
arXiv Detail & Related papers (2021-11-04T09:29:16Z) - Hierarchical Text Classification of Urdu News using Deep Neural Network [0.0]
This paper proposes a deep learning model for hierarchical text classification of news in Urdu language.
It consists of 51,325 sentences from 8 online news websites belonging to the following genres: Sports; Technology; and Entertainment.
arXiv Detail & Related papers (2021-07-07T11:06:11Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax
Layer [77.90012156266324]
This paper aims to find a subspace of neural networks that can facilitate a large decision margin.
We propose the Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain during both the training and test processes.
Experimental results demonstrate that the proposed OSL has better performance than the methods used for comparison on four small-sample benchmark datasets.
arXiv Detail & Related papers (2020-04-20T02:41:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.