TIMME: Twitter Ideology-detection via Multi-task Multi-relational
Embedding
- URL: http://arxiv.org/abs/2006.01321v3
- Date: Thu, 18 Jun 2020 05:08:45 GMT
- Title: TIMME: Twitter Ideology-detection via Multi-task Multi-relational
Embedding
- Authors: Zhiping Xiao, Weiping Song, Haoyan Xu, Zhicheng Ren, Yizhou Sun
- Abstract summary: We aim at solving the problem of predicting people's ideology, or political tendency.
We estimate it by using Twitter data, and formalize it as a classification problem.
- Score: 26.074367752142198
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We aim at solving the problem of predicting people's ideology, or political
tendency. We estimate it by using Twitter data, and formalize it as a
classification problem. Ideology-detection has long been a challenging yet
important problem. Certain groups, such as the policy makers, rely on it to
make wise decisions. Back in the old days when labor-intensive survey-studies
were needed to collect public opinions, analyzing ordinary citizens' political
tendencies was uneasy. The rise of social medias, such as Twitter, has enabled
us to gather ordinary citizen's data easily. However, the incompleteness of the
labels and the features in social network datasets is tricky, not to mention
the enormous data size and the heterogeneousity. The data differ dramatically
from many commonly-used datasets, thus brings unique challenges. In our work,
first we built our own datasets from Twitter. Next, we proposed TIMME, a
multi-task multi-relational embedding model, that works efficiently on
sparsely-labeled heterogeneous real-world dataset. It could also handle the
incompleteness of the input features. Experimental results showed that TIMME is
overall better than the state-of-the-art models for ideology detection on
Twitter. Our findings include: links can lead to good classification outcomes
without text; conservative voice is under-represented on Twitter; follow is the
most important relation to predict ideology; retweet and mention enhance a
higher chance of like, etc. Last but not least, TIMME could be extended to
other datasets and tasks in theory.
Related papers
- ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Detecting Political Opinions in Tweets through Bipartite Graph Analysis:
A Skip Aggregation Graph Convolution Approach [9.350629400940493]
We focus on the 2020 US presidential election and create a large-scale dataset from Twitter.
To detect political opinions in tweets, we build a user-tweet bipartite graph based on users' posting and retweeting behaviors.
We introduce a novel skip aggregation mechanism that makes tweet nodes aggregate information from second-order neighbors.
arXiv Detail & Related papers (2023-04-22T10:38:35Z) - DoubleH: Twitter User Stance Detection via Bipartite Graph Neural
Networks [9.350629400940493]
We crawl a large-scale dataset of the 2020 US presidential election and automatically label all users by manually tagged hashtags.
We propose a bipartite graph neural network model, DoubleH, which aims to better utilize homogeneous and heterogeneous information in user stance detection tasks.
arXiv Detail & Related papers (2023-01-20T19:20:10Z) - Design and analysis of tweet-based election models for the 2021 Mexican
legislative election [55.41644538483948]
We use a dataset of 15 million election-related tweets in the six months preceding election day.
We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods.
arXiv Detail & Related papers (2023-01-02T12:40:05Z) - Decay No More: A Persistent Twitter Dataset for Learning Social Meaning [10.227026799075215]
We propose a new persistent English Twitter dataset for social meaning (PTSM)
PTSM consists of $17$ social meaning datasets in $10$ categories of tasks.
We experiment with two SOTA pre-trained language models and show that our PTSM can substitute the actual tweets with paraphrases with marginal performance loss.
arXiv Detail & Related papers (2022-04-10T06:07:54Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z) - Two-Faced Humans on Twitter and Facebook: Harvesting Social Multimedia
for Human Personality Profiling [74.83957286553924]
We infer the Myers-Briggs Personality Type indicators by applying a novel multi-view fusion framework, called "PERS"
Our experimental results demonstrate the PERS's ability to learn from multi-view data for personality profiling by efficiently leveraging on the significantly different data arriving from diverse social multimedia sources.
arXiv Detail & Related papers (2021-06-20T10:48:49Z) - Sentiment Analysis on Social Media Content [0.0]
The aim of this paper is to present a model that can perform sentiment analysis of real data collected from Twitter.
Data in Twitter is highly unstructured which makes it difficult to analyze.
Our proposed model is different from prior work in this field because it combined the use of supervised and unsupervised machine learning algorithms.
arXiv Detail & Related papers (2020-07-04T17:03:30Z) - Stance in Replies and Quotes (SRQ): A New Dataset For Learning Stance in
Twitter Conversations [8.097870074875729]
We present the largest human-labeled stance dataset for Twitter conversations with over 5200 stance labels.
We include many baseline models for learning the stance in conversations and compare the performance of various models.
arXiv Detail & Related papers (2020-06-01T03:30:08Z) - Contrastive Examples for Addressing the Tyranny of the Majority [83.93825214500131]
We propose to create a balanced training dataset, consisting of the original dataset plus new data points in which the group memberships are intervened.
We show that current generative adversarial networks are a powerful tool for learning these data points, called contrastive examples.
arXiv Detail & Related papers (2020-04-14T14:06:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.