A Framework for Generating Annotated Social Media Corpora with
Demographics, Stance, Civility, and Topicality
- URL: http://arxiv.org/abs/2012.05444v1
- Date: Thu, 10 Dec 2020 04:06:25 GMT
- Title: A Framework for Generating Annotated Social Media Corpora with
Demographics, Stance, Civility, and Topicality
- Authors: Shubhanshu Mishra, Daniel Collier
- Abstract summary: We introduce a framework for annotating a social media text corpora for various categories.
We use a case study of a Facebook comment corpora on student loan discussion which was annotated for gender, military affiliation, age-group, political leaning, race, stance, topicalilty, neoliberlistic views and civility of the comment.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper we introduce a framework for annotating a social media text
corpora for various categories. Since, social media data is generated via
individuals, it is important to annotate the text for the individuals
demographic attributes to enable a socio-technical analysis of the corpora.
Furthermore, when analyzing a large data-set we can often annotate a small
sample of data and then train a prediction model using this sample to annotate
the full data for the relevant categories. We use a case study of a Facebook
comment corpora on student loan discussion which was annotated for gender,
military affiliation, age-group, political leaning, race, stance, topicalilty,
neoliberlistic views and civility of the comment. We release three datasets of
Facebook comments for further research at:
https://github.com/socialmediaie/StudentDebtFbComments
Related papers
- Reddit is all you need: Authorship profiling for Romanian [49.1574468325115]
Authorship profiling is the process of identifying an author's characteristics based on their writings.
In this paper, we introduce a corpus of short texts in the Romanian language, annotated with certain author characteristic keywords.
arXiv Detail & Related papers (2024-10-13T16:27:31Z) - "A Tale of Two Movements": Identifying and Comparing Perspectives in
#BlackLivesMatter and #BlueLivesMatter Movements-related Tweets using Weakly
Supervised Graph-based Structured Prediction [24.02026820625265]
Social media has become a major driver of social change, by facilitating the formation of online social movements.
We propose a weakly supervised graph-based approach that explicitly models perspectives in #BackLivesMatter-related tweets.
arXiv Detail & Related papers (2023-10-11T03:01:42Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - GREENER: Graph Neural Networks for News Media Profiling [24.675574340841163]
We study the problem of profiling news media on the Web with respect to their factuality of reporting and bias.
Our main focus is on modeling the similarity between media outlets based on the overlap of their audience.
Prediction accuracy is found to improve by 2.5-27 macro-F1 points for the two tasks.
arXiv Detail & Related papers (2022-11-10T12:46:29Z) - Sentiment Analysis with R: Natural Language Processing for
Semi-Automated Assessments of Qualitative Data [0.0]
This tutorial introduces the basic functions for performing a sentiment analysis with R and explains how text documents can be analysed step by step.
A comparison of two political speeches illustrates a possible use case.
arXiv Detail & Related papers (2022-06-25T13:25:39Z) - A Large-scale Dataset for Hate Speech Detection on Vietnamese Social
Media Texts [0.32228025627337864]
ViHSD is a human-annotated dataset for automatically detecting hate speech on the social network.
This dataset contains over 30,000 comments, each comment in the dataset has one of three labels: CLEAN, OFFENSIVE, or HATE.
arXiv Detail & Related papers (2021-03-22T00:55:47Z) - Content-based Analysis of the Cultural Differences between TikTok and
Douyin [95.32409577885645]
Short-form video social media shifts away from the traditional media paradigm by telling the audience a dynamic story to attract their attention.
In particular, different combinations of everyday objects can be employed to represent a unique scene that is both interesting and understandable.
Offered by the same company, TikTok and Douyin are popular examples of such new media that has become popular in recent years.
The hypothesis that they express cultural differences together with media fashion and social idiosyncrasy is the primary target of our research.
arXiv Detail & Related papers (2020-11-03T01:47:49Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Echo Chambers on Social Media: A comparative analysis [64.2256216637683]
We introduce an operational definition of echo chambers and perform a massive comparative analysis on 1B pieces of contents produced by 1M users on four social media platforms.
We infer the leaning of users about controversial topics and reconstruct their interaction networks by analyzing different features.
We find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.
arXiv Detail & Related papers (2020-04-20T20:00:27Z) - Contrastive Examples for Addressing the Tyranny of the Majority [83.93825214500131]
We propose to create a balanced training dataset, consisting of the original dataset plus new data points in which the group memberships are intervened.
We show that current generative adversarial networks are a powerful tool for learning these data points, called contrastive examples.
arXiv Detail & Related papers (2020-04-14T14:06:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.