Multi-Class and Automated Tweet Categorization
- URL: http://arxiv.org/abs/2112.03005v1
- Date: Sat, 13 Nov 2021 14:28:47 GMT
- Title: Multi-Class and Automated Tweet Categorization
- Authors: Khubaib Ahmed Qureshi
- Abstract summary: The study reported here aims to detect the tweet category from its text.
The tweet is categorized under 12 specified categories using Text Mining or Natural Language Processing (NLP), and Machine Learning (ML) techniques.
The best ensemble model named, Gradient Boosting achieved an AUC score of 85%.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Twitter is among the most prevalent social media platform being used by
millions of people all over the world. It is used to express ideas and opinions
about political, social, business, sports, health, religion, and various other
categories. The study reported here aims to detect the tweet category from its
text. It becomes quite challenging when text consists of 140 characters only,
with full of noise. The tweet is categorized under 12 specified categories
using Text Mining or Natural Language Processing (NLP), and Machine Learning
(ML) techniques. It is observed that a huge number of trending topics are
provided by Twitter but it is really challenging to find out that what these
trending topics are all about. Therefore, it is extremely crucial to
automatically categorize the tweets into general categories for plenty of
information extraction tasks. A large dataset is constructed by combining two
different nature of datasets having varying levels of category identification
complexities. It is annotated by experts under proper guidelines for increased
quality and high agreement values. It makes the proposed model quite robust.
Various types of ML algorithms were used to train and evaluate the proposed
model. These models have explored over three datasets separately. It is
explored that the nature of the dataset is highly non-linear therefore complex
or non-linear models perform better. The best ensemble model named, Gradient
Boosting achieved an AUC score of 85\%. That is much better than the other
related studies conducted.
Related papers
- Political Sentiment Analysis of Persian Tweets Using CNN-LSTM Model [0.356008609689971]
We present several machine learning and a deep learning model to analysis sentiment of Persian political tweets.
Deep learning with ParsBERT embedding performs better than machine learning.
arXiv Detail & Related papers (2023-07-15T08:08:38Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Benchmarking Multimodal AutoML for Tabular Data with Text Fields [83.43249184357053]
We assemble 18 multimodal data tables that each contain some text fields.
Our benchmark enables researchers to evaluate their own methods for supervised learning with numeric, categorical, and text features.
arXiv Detail & Related papers (2021-11-04T09:29:16Z) - Exploiting BERT For Multimodal Target SentimentClassification Through
Input Space Translation [75.82110684355979]
We introduce a two-stream model that translates images in input space using an object-aware transformer.
We then leverage the translation to construct an auxiliary sentence that provides multimodal information to a language model.
We achieve state-of-the-art performance on two multimodal Twitter datasets.
arXiv Detail & Related papers (2021-08-03T18:02:38Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep
Learning Benchmarks [5.937482215664902]
Social media content is often too noisy for direct use in any application.
It is important to filter, categorize, and concisely summarize the available content to facilitate effective consumption and decision-making.
We present a new large-scale dataset with 77K human-labeled tweets, sampled from a pool of 24 million tweets across 19 disaster events.
arXiv Detail & Related papers (2021-04-07T12:29:36Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z) - TIMME: Twitter Ideology-detection via Multi-task Multi-relational
Embedding [26.074367752142198]
We aim at solving the problem of predicting people's ideology, or political tendency.
We estimate it by using Twitter data, and formalize it as a classification problem.
arXiv Detail & Related papers (2020-06-02T00:00:39Z) - Stance in Replies and Quotes (SRQ): A New Dataset For Learning Stance in
Twitter Conversations [8.097870074875729]
We present the largest human-labeled stance dataset for Twitter conversations with over 5200 stance labels.
We include many baseline models for learning the stance in conversations and compare the performance of various models.
arXiv Detail & Related papers (2020-06-01T03:30:08Z) - The World is Not Binary: Learning to Rank with Grayscale Data for
Dialogue Response Selection [55.390442067381755]
We show that grayscale data can be automatically constructed without human effort.
Our method employs off-the-shelf response retrieval models and response generation models as automatic grayscale data generators.
Experiments on three benchmark datasets and four state-of-the-art matching models show that the proposed approach brings significant and consistent performance improvements.
arXiv Detail & Related papers (2020-04-06T06:34:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.