Investigating Classification Techniques with Feature Selection For
Intention Mining From Twitter Feed
- URL: http://arxiv.org/abs/2001.10380v1
- Date: Wed, 22 Jan 2020 11:55:33 GMT
- Title: Investigating Classification Techniques with Feature Selection For
Intention Mining From Twitter Feed
- Authors: Qadri Mishael and Aladdin Ayesh
- Abstract summary: Micro-blogging service Twitter has more than 200 million registered users who exchange more than 65 million posts per day.
Most of the tweets are written informally and often in slang language.
This paper investigates the problem of selecting features that affect extracting user's intention from Twitter feeds.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In the last decade, social networks became most popular medium for
communication and interaction. As an example, micro-blogging service Twitter
has more than 200 million registered users who exchange more than 65 million
posts per day. Users express their thoughts, ideas, and even their intentions
through these tweets. Most of the tweets are written informally and often in
slang language, that contains misspelt and abbreviated words. This paper
investigates the problem of selecting features that affect extracting user's
intention from Twitter feeds based on text mining techniques. It starts by
presenting the method we used to construct our own dataset from extracted
Twitter feeds. Following that, we present two techniques of feature selection
followed by classification. In the first technique, we use Information Gain as
a one-phase feature selection, followed by supervised classification
algorithms. In the second technique, we use a hybrid approach based on forward
feature selection algorithm in which two feature selection techniques employed
followed by classification algorithms. We examine these two techniques with
four classification algorithms. We evaluate them using our own dataset, and we
critically review the results.
Related papers
- Context-Based Tweet Engagement Prediction [0.0]
This thesis investigates how well context alone may be used to predict tweet engagement likelihood.
We employed the Spark engine on TU Wien's Little Big Data Cluster to create scalable data preprocessing, feature engineering, feature selection, and machine learning pipelines.
We also found that factors such as the prediction algorithm, training dataset size, training dataset sampling method, and feature selection significantly affect the results.
arXiv Detail & Related papers (2023-09-28T08:36:57Z) - ChatGraph: Interpretable Text Classification by Converting ChatGPT
Knowledge to Graphs [54.48467003509595]
ChatGPT has shown superior performance in various natural language processing (NLP) tasks.
We propose a novel framework that leverages the power of ChatGPT for specific tasks, such as text classification.
Our method provides a more transparent decision-making process compared with previous text classification methods.
arXiv Detail & Related papers (2023-05-03T19:57:43Z) - Improved Topic modeling in Twitter through Community Pooling [0.0]
Twitter posts are short and often less coherent than other text documents.
We propose a new pooling scheme for topic modeling in Twitter, which groups tweets whose authors belong to the same community.
Results show that our Community polling method outperformed other methods on the majority of metrics in two heterogeneous datasets.
arXiv Detail & Related papers (2021-12-20T17:05:32Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or
Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms.
Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications.
By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z) - A Case Study to Reveal if an Area of Interest has a Trend in Ongoing
Tweets Using Word and Sentence Embeddings [0.0]
We have proposed an easily applicable automated methodology in which the Daily Mean Similarity Scores show the similarity between the daily tweet corpus and the target words.
The Daily Mean Similarity Scores have mainly based on cosine similarity and word/sentence embeddings.
We have also compared the effectiveness of using word versus sentence embeddings while applying our methodology and realized that both give almost the same results.
arXiv Detail & Related papers (2021-10-02T18:44:55Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - DeepStyle: User Style Embedding for Authorship Attribution of Short
Texts [57.503904346336384]
Authorship attribution (AA) is an important and widely studied research topic with many applications.
Recent works have shown that deep learning methods could achieve significant accuracy improvement for the AA task.
We propose DeepStyle, a novel embedding-based framework that learns the representations of users' salient writing styles.
arXiv Detail & Related papers (2021-03-14T15:56:37Z) - Towards A Sentiment Analyzer for Low-Resource Languages [0.0]
This research aims to analyse a sentiment of the users towards a particular trending topic that has been actively and massively discussed at that time.
We use the hashtag textit#kpujangancurang that was the trending topic during the Indonesia presidential election in 2019.
This research utilizes rapid miner tool to generate the twitter data and comparing Naive Bayes, K-Nearest Neighbor, Decision Tree, and Multi-Layer Perceptron classification methods to classify the sentiment of the twitter data.
arXiv Detail & Related papers (2020-11-12T13:50:00Z) - Forensic Authorship Analysis of Microblogging Texts Using N-Grams and
Stylometric Features [63.48764893706088]
This work aims at identifying authors of tweet messages, which are limited to 280 characters.
We use for our experiments a self-captured database of 40 users, with 120 to 200 tweets per user.
Results using this small set are promising, with the different features providing a classification accuracy between 92% and 98.5%.
arXiv Detail & Related papers (2020-03-24T19:32:11Z) - Utilizing Deep Learning to Identify Drug Use on Twitter Data [0.0]
The classification power of multiple methods was compared including support vector machines (SVM), XGBoost, and convolutional neural network (CNN) based classifiers.
The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91.
The synthetically generated set provided increased scores, improving the classification capability and proving the worth of this methodology.
arXiv Detail & Related papers (2020-03-08T07:52:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.