Utilizing Social Media Attributes for Enhanced Keyword Detection: An
IDF-LDA Model Applied to Sina Weibo
- URL: http://arxiv.org/abs/2306.07978v1
- Date: Tue, 30 May 2023 08:35:39 GMT
- Title: Utilizing Social Media Attributes for Enhanced Keyword Detection: An
IDF-LDA Model Applied to Sina Weibo
- Authors: Yifei Yue
- Abstract summary: We propose a novel method to address the keyword detection problem in social media.
Our model combines the Inverse Document Frequency (IDF) and Latent Dirichlet Allocation (LDA) models to better cope with the distinct attributes of social media data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid development of social media such as Twitter and Weibo,
detecting keywords from a huge volume of text data streams in real-time has
become a critical problem. The keyword detection problem aims at searching
important information from massive text data to reflect the most important
events or topics. However, social media data usually has unique features: the
documents are usually short, the language is colloquial, and the data is likely
to have significant temporal patterns. Therefore, it could be challenging to
discover critical information from these text streams. In this paper, we
propose a novel method to address the keyword detection problem in social
media. Our model combines the Inverse Document Frequency (IDF) and Latent
Dirichlet Allocation (LDA) models to better cope with the distinct attributes
of social media data, such as the number of likes, comments, and retweets. By
weighting the importance of each document based on these attributes, our method
can effectively detect more representative keywords over time. Comprehensive
experiments conducted under various conditions on Weibo data illustrate that
our approach outperforms the baselines in various evaluation metrics, including
precision and recall for multiple problem settings.
Related papers
- Robust Domain Misinformation Detection via Multi-modal Feature Alignment [49.89164555394584]
We propose a robust domain and cross-modal approach for multi-modal misinformation detection.
It reduces the domain shift by aligning the joint distribution of textual and visual modalities.
We also propose a framework that simultaneously considers application scenarios of domain generalization.
arXiv Detail & Related papers (2023-11-24T07:06:16Z) - An Attention-Based Denoising Framework for Personality Detection in
Social Media Texts [1.4887196224762684]
Personality detection based on user-generated texts is a universal method that can be used to build user portraits.
We propose an attention-based information extraction mechanism (AIEM) for long texts, which is applied to quickly locate valuable pieces of information.
We obtain an average accuracy improvement of 10.2% on the gold standard Twitter-Myers-Briggs Type Indicator dataset.
arXiv Detail & Related papers (2023-11-16T14:56:09Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Improved Topic modeling in Twitter through Community Pooling [0.0]
Twitter posts are short and often less coherent than other text documents.
We propose a new pooling scheme for topic modeling in Twitter, which groups tweets whose authors belong to the same community.
Results show that our Community polling method outperformed other methods on the majority of metrics in two heterogeneous datasets.
arXiv Detail & Related papers (2021-12-20T17:05:32Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - The Surprising Performance of Simple Baselines for Misinformation
Detection [4.060731229044571]
We examine the performance of a broad set of modern transformer-based language models.
We present our framework as a baseline for creating and evaluating new methods for misinformation detection.
arXiv Detail & Related papers (2021-04-14T16:25:22Z) - Cross-Media Keyphrase Prediction: A Unified Framework with
Multi-Modality Multi-Head Attention and Image Wordings [63.79979145520512]
We explore the joint effects of texts and images in predicting the keyphrases for a multimedia post.
We propose a novel Multi-Modality Multi-Head Attention (M3H-Att) to capture the intricate cross-media interactions.
Our model significantly outperforms the previous state of the art based on traditional attention networks.
arXiv Detail & Related papers (2020-11-03T08:44:18Z) - TopicBERT: A Transformer transfer learning based memory-graph approach
for multimodal streaming social media topic detection [8.338441212378587]
Social networks with bursty short messages and their respective large data scale spread among vast variety of topics are research interest of many researchers.
These properties of social networks which are known as 5'Vs of big data has led to many unique and enlightenment algorithms and techniques applied to large social networking datasets and data streams.
arXiv Detail & Related papers (2020-08-16T10:39:50Z) - Mining Implicit Relevance Feedback from User Behavior for Web Question
Answering [92.45607094299181]
We make the first study to explore the correlation between user behavior and passage relevance.
Our approach significantly improves the accuracy of passage ranking without extra human labeled data.
In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine.
arXiv Detail & Related papers (2020-06-13T07:02:08Z) - GLEAKE: Global and Local Embedding Automatic Keyphrase Extraction [1.0681288493631977]
We introduce Global and Local Embedding Automatic Keyphrase Extractor (GLEAKE) for the task of automatic keyphrase extraction.
GLEAKE uses single and multi-word embedding techniques to explore the syntactic and semantic aspects of the candidate phrases.
It refines the most significant phrases as a final set of keyphrases.
arXiv Detail & Related papers (2020-05-19T20:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.