Performance Comparison of Simple Transformer and Res-CNN-BiLSTM for
Cyberbullying Classification
- URL: http://arxiv.org/abs/2206.02206v1
- Date: Sun, 5 Jun 2022 15:46:21 GMT
- Title: Performance Comparison of Simple Transformer and Res-CNN-BiLSTM for
Cyberbullying Classification
- Authors: Raunak Joshi, Abhishek Gupta
- Abstract summary: We present a performance based comparison between simple transformer based network and Res-CNN-BiLSTM based network for cyberbullying text classification problem.
The results obtained show that transformer we trained with 0.65 million parameters has significantly being able to beat the performance of Res-CNN-BiLSTM with 48.82 million parameters for faster training speeds and more generalized metrics.
- Score: 4.2317391919680425
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of text classification using Bidirectional based LSTM architectures
is computationally expensive and time consuming to train. For this,
transformers were discovered which effectively give good performance as
compared to the traditional deep learning architectures. In this paper we
present a performance based comparison between simple transformer based network
and Res-CNN-BiLSTM based network for cyberbullying text classification problem.
The results obtained show that transformer we trained with 0.65 million
parameters has significantly being able to beat the performance of
Res-CNN-BiLSTM with 48.82 million parameters for faster training speeds and
more generalized metrics. The paper also compares the 1-dimensional character
level embedding network and 100-dimensional glove embedding network with
transformer.
Related papers
- MoEUT: Mixture-of-Experts Universal Transformers [75.96744719516813]
Universal Transformers (UTs) have advantages over standard Transformers in learning compositional generalizations.
Layer-sharing drastically reduces the parameter count compared to the non-shared model with the same dimensionality.
No previous work has succeeded in proposing a shared-layer Transformer design that is competitive in parameter count-dominated tasks such as language modeling.
arXiv Detail & Related papers (2024-05-25T03:24:32Z) - Self-Supervised Pre-Training for Table Structure Recognition Transformer [25.04573593082671]
We propose a self-supervised pre-training (SSP) method for table structure recognition transformers.
We discover that the performance gap between the linear projection transformer and the hybrid CNN-transformer can be mitigated by SSP of the visual encoder in the TSR model.
arXiv Detail & Related papers (2024-02-23T19:34:06Z) - RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.
We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z) - Image Classification using Sequence of Pixels [3.04585143845864]
This study compares sequential image classification methods based on recurrent neural networks.
We describe methods based on Long-Short-Term memory(LSTM), bidirectional Long-Short-Term memory(BiLSTM) architectures, etc.
arXiv Detail & Related papers (2022-09-23T09:42:44Z) - On the Prediction Network Architecture in RNN-T for ASR [1.7262456746016954]
We compare 4 types of prediction networks based on a common state-of-the-art Conformer encoder.
Inspired by our scoreboard, we propose a new simple prediction network architecture, N-Concat.
arXiv Detail & Related papers (2022-06-29T13:11:46Z) - TRT-ViT: TensorRT-oriented Vision Transformer [19.173764508139016]
A family ofRT-oriented Transformers is presented, abbreviated as TRT-ViT.
Extensive experiments demonstrate that TRT-ViT significantly outperforms existing ConvNets and vision Transformers.
arXiv Detail & Related papers (2022-05-19T14:20:25Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - Container: Context Aggregation Network [83.12004501984043]
Recent finding shows that a simple based solution without any traditional convolutional or Transformer components can produce effective visual representations.
We present the model (CONText Ion NERtwok), a general-purpose building block for multi-head context aggregation.
In contrast to Transformer-based methods that do not scale well to downstream tasks that rely on larger input image resolutions, our efficient network, named modellight, can be employed in object detection and instance segmentation networks.
arXiv Detail & Related papers (2021-06-02T18:09:11Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z) - Rewiring the Transformer with Depth-Wise LSTMs [55.50278212605607]
We present a Transformer with depth-wise LSTMs connecting cascading Transformer layers and sub-layers.
Experiments with the 6-layer Transformer show significant BLEU improvements in both WMT 14 English-German / French tasks and the OPUS-100 many-to-many multilingual NMT task.
arXiv Detail & Related papers (2020-07-13T09:19:34Z) - Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR
in Transfer Learning [37.55706646713447]
We propose a hybrid Transformer-LSTM based architecture to improve low-resource end-to-end ASR.
We conduct experiments on our in-house Malay corpus which contains limited labeled data and a large amount of extra text.
Overall, our best model outperforms the vanilla Transformer ASR by 11.9% relative WER.
arXiv Detail & Related papers (2020-05-21T00:56:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.