Improved Topic modeling in Twitter through Community Pooling
- URL: http://arxiv.org/abs/2201.00690v1
- Date: Mon, 20 Dec 2021 17:05:32 GMT
- Title: Improved Topic modeling in Twitter through Community Pooling
- Authors: Federico Albanese and Esteban Feuerstein
- Abstract summary: Twitter posts are short and often less coherent than other text documents.
We propose a new pooling scheme for topic modeling in Twitter, which groups tweets whose authors belong to the same community.
Results show that our Community polling method outperformed other methods on the majority of metrics in two heterogeneous datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social networks play a fundamental role in propagation of information and
news. Characterizing the content of the messages becomes vital for different
tasks, like breaking news detection, personalized message recommendation, fake
users detection, information flow characterization and others. However, Twitter
posts are short and often less coherent than other text documents, which makes
it challenging to apply text mining algorithms to these datasets efficiently.
Tweet-pooling (aggregating tweets into longer documents) has been shown to
improve automatic topic decomposition, but the performance achieved in this
task varies depending on the pooling method.
In this paper, we propose a new pooling scheme for topic modeling in Twitter,
which groups tweets whose authors belong to the same community (group of users
who mainly interact with each other but not with other groups) on a user
interaction graph. We present a complete evaluation of this methodology, state
of the art schemes and previous pooling models in terms of the cluster quality,
document retrieval tasks performance and supervised machine learning
classification score. Results show that our Community polling method
outperformed other methods on the majority of metrics in two heterogeneous
datasets, while also reducing the running time. This is useful when dealing
with big amounts of noisy and short user-generated social media texts. Overall,
our findings contribute to an improved methodology for identifying the latent
topics in a Twitter dataset, without the need of modifying the basic machinery
of a topic decomposition model.
Related papers
- CAST: Corpus-Aware Self-similarity Enhanced Topic modelling [16.562349140796115]
We introduce CAST: Corpus-Aware Self-similarity Enhanced Topic modelling, a novel topic modelling method.
We find self-similarity to be an effective metric to prevent functional words from acting as candidate topic words.
Our approach significantly enhances the coherence and diversity of generated topics, as well as the topic model's ability to handle noisy data.
arXiv Detail & Related papers (2024-10-19T15:27:11Z) - NarrationDep: Narratives on Social Media For Automatic Depression Detection [24.11420537250414]
We have developed a novel model called textttNarrationDep, which focuses on detecting narratives associated with depression.
textttNarrationDep is a deep learning framework that jointly models individual user tweet representations and clusters of users' tweets.
arXiv Detail & Related papers (2024-07-24T11:24:25Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Twitter Referral Behaviours on News Consumption with Ensemble Clustering
of Click-Stream Data in Turkish Media [2.9005223064604078]
This study investigates the readers' click activities in the organizations' websites to identify news consumption patterns following referrals from Twitter.
The investigation is widened to a broad perspective by linking the log data with news content to enrich the insights.
arXiv Detail & Related papers (2022-02-04T09:57:13Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - The Surprising Performance of Simple Baselines for Misinformation
Detection [4.060731229044571]
We examine the performance of a broad set of modern transformer-based language models.
We present our framework as a baseline for creating and evaluating new methods for misinformation detection.
arXiv Detail & Related papers (2021-04-14T16:25:22Z) - Be More with Less: Hypergraph Attention Networks for Inductive Text
Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task.
Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words.
We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z) - Learning to Match Jobs with Resumes from Sparse Interaction Data using
Multi-View Co-Teaching Network [83.64416937454801]
Job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms.
We propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching.
Our model is able to outperform state-of-the-art methods for job-resume matching.
arXiv Detail & Related papers (2020-09-25T03:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.