A Precisely Xtreme-Multi Channel Hybrid Approach For Roman Urdu
Sentiment Analysis
- URL: http://arxiv.org/abs/2003.05443v1
- Date: Wed, 11 Mar 2020 04:08:27 GMT
- Title: A Precisely Xtreme-Multi Channel Hybrid Approach For Roman Urdu
Sentiment Analysis
- Authors: Faiza Memood, Muhammad Usman Ghani, Muhammad Ali Ibrahim, Rehab
Shehzadi, Muhammad Nabeel Asim
- Abstract summary: This paper provides 3 neural word embeddings prepared using most widely used approaches namely Word2vec, FastText, and Glove.
Considering the lack of publicly available benchmark datasets, it provides a first-ever Roman Urdu dataset which consists of 3241 sentiments annotated against positive, negative and neutral classes.
It proposes a novel precisely extreme multi-channel hybrid methodology which outperforms state-of-the-art adapted machine and deep learning approaches by the figure of 9%, and 4% in terms of F1-score.
- Score: 0.8812173669205371
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In order to accelerate the performance of various Natural Language Processing
tasks for Roman Urdu, this paper for the very first time provides 3 neural word
embeddings prepared using most widely used approaches namely Word2vec,
FastText, and Glove. The integrity of generated neural word embeddings is
evaluated using intrinsic and extrinsic evaluation approaches. Considering the
lack of publicly available benchmark datasets, it provides a first-ever Roman
Urdu dataset which consists of 3241 sentiments annotated against positive,
negative and neutral classes. To provide benchmark baseline performance over
the presented dataset, we adapt diverse machine learning (Support Vector
Machine Logistic Regression, Naive Bayes), deep learning (convolutional neural
network, recurrent neural network), and hybrid approaches. Effectiveness of
generated neural word embeddings is evaluated by comparing the performance of
machine and deep learning based methodologies using 7, and 5 distinct feature
representation approaches respectively. Finally, it proposes a novel precisely
extreme multi-channel hybrid methodology which outperforms state-of-the-art
adapted machine and deep learning approaches by the figure of 9%, and 4% in
terms of F1-score. Roman Urdu Sentiment Analysis, Pretrain word embeddings for
Roman Urdu, Word2Vec, Glove, Fast-Text
Related papers
- A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - Improving Sampling Methods for Fine-tuning SentenceBERT in Text Streams [49.3179290313959]
This study explores the efficacy of seven text sampling methods designed to selectively fine-tune language models.
We precisely assess the impact of these methods on fine-tuning the SBERT model using four different loss functions.
Our findings indicate that Softmax loss and Batch All Triplets loss are particularly effective for text stream classification.
arXiv Detail & Related papers (2024-03-18T23:41:52Z) - Multi-label Text Classification using GloVe and Neural Network Models [0.27195102129094995]
Existing solutions include traditional machine learning and deep neural networks for predictions.
This paper proposes a method utilizing the bag-of-words model approach based on the GloVe model and the CNN-BiLSTM network.
The method achieves an accuracy rate of 87.26% on the test set and an F1 score of 0.8737, showcasing promising results.
arXiv Detail & Related papers (2023-10-25T01:30:26Z) - Language Model Decoding as Direct Metrics Optimization [87.68281625776282]
Current decoding methods struggle to generate texts that align with human texts across different aspects.
In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts.
We prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.
arXiv Detail & Related papers (2023-10-02T09:35:27Z) - Khmer Text Classification Using Word Embedding and Neural Networks [0.0]
We discuss various classification approaches for Khmer text.
A Khmer word embedding model is trained on a 30-million-Khmer-word corpus to construct word vector representations.
We evaluate the performance of different approaches on a news article dataset for both multi-class and multi-label text classification tasks.
arXiv Detail & Related papers (2021-12-13T15:57:32Z) - An Attention Ensemble Approach for Efficient Text Classification of
Indian Languages [0.0]
This paper focuses on the coarse-grained technical domain identification of short text documents in Marathi, a Devanagari script-based Indian language.
A hybrid CNN-BiLSTM attention ensemble model is proposed that competently combines the intermediate sentence representations generated by the convolutional neural network and the bidirectional long short-term memory, leading to efficient text classification.
Experimental results show that the proposed model outperforms various baseline machine learning and deep learning models in the given task, giving the best validation accuracy of 89.57% and f1-score of 0.8875.
arXiv Detail & Related papers (2021-02-20T07:31:38Z) - Be More with Less: Hypergraph Attention Networks for Inductive Text
Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task.
Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words.
We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z) - Closed Loop Neural-Symbolic Learning via Integrating Neural Perception,
Grammar Parsing, and Symbolic Reasoning [134.77207192945053]
Prior methods learn the neural-symbolic models using reinforcement learning approaches.
We introduce the textbfgrammar model as a textitsymbolic prior to bridge neural perception and symbolic reasoning.
We propose a novel textbfback-search algorithm which mimics the top-down human-like learning procedure to propagate the error.
arXiv Detail & Related papers (2020-06-11T17:42:49Z) - Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description.
We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z) - SeMemNN: A Semantic Matrix-Based Memory Neural Network for Text
Classification [15.111940377403252]
We propose 5 different configurations for the semantic matrix-based memory neural network with end-to-end learning manner.
We evaluate our proposed method on two corpora of news articles (AG news, Sogou news)
arXiv Detail & Related papers (2020-03-04T02:00:57Z) - Benchmark Performance of Machine And Deep Learning Based Methodologies
for Urdu Text Document Classification [4.1353427192071015]
This paper provides benchmark performance for Urdu text document classification.
It investigates the performance impact of traditional machine learning based Urdu text document classification methodologies.
For the very first time, it as-sesses the performance of various deep learning based methodologies for Urdu text document classification.
arXiv Detail & Related papers (2020-03-03T05:49:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.