A Dataset of State-Censored Tweets
- URL: http://arxiv.org/abs/2101.05919v3
- Date: Fri, 19 Mar 2021 18:40:27 GMT
- Title: A Dataset of State-Censored Tweets
- Authors: Tu\u{g}rulcan Elmas, Rebekah Overdorf, Karl Aberer
- Abstract summary: We release a dataset of 583,437 tweets by 155,715 users that were censored between 2012-2020 July.
We also release 4,301 accounts that were censored in their entirety.
Our dataset will not only aid in the study of government censorship but will also aid in studying hate speech detection and the effect of censorship on social media users.
- Score: 3.0254442724635173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many governments impose traditional censorship methods on social media
platforms. Instead of removing it completely, many social media companies,
including Twitter, only withhold the content from the requesting country. This
makes such content still accessible outside of the censored region, allowing
for an excellent setting in which to study government censorship on social
media. We mine such content using the Internet Archive's Twitter Stream Grab.
We release a dataset of 583,437 tweets by 155,715 users that were censored
between 2012-2020 July. We also release 4,301 accounts that were censored in
their entirety. Additionally, we release a set of 22,083,759 supplemental
tweets made up of all tweets by users with at least one censored tweet as well
as instances of other users retweeting the censored user. We provide an
exploratory analysis of this dataset. Our dataset will not only aid in the
study of government censorship but will also aid in studying hate speech
detection and the effect of censorship on social media users. The dataset is
publicly available at https://doi.org/10.5281/zenodo.4439509
Related papers
- Russo-Ukrainian War: Prediction and explanation of Twitter suspension [47.61306219245444]
This study focuses on the Twitter suspension mechanism and the analysis of shared content and features of user accounts that may lead to this.
We have obtained a dataset containing 107.7M tweets, originating from 9.8 million users, using Twitter API.
Our results reveal scam campaigns taking advantage of trending topics regarding the Russia-Ukrainian conflict for Bitcoin fraud, spam, and advertisement campaigns.
arXiv Detail & Related papers (2023-06-06T08:41:02Z) - Detecting and Reasoning of Deleted Tweets before they are Posted [5.300190188468289]
We identify deleted tweets, particularly within the Arabic context, and label them with a corresponding fine-grained disinformation category.
We then develop models that can predict the potentiality of tweets getting deleted, as well as the potential reasons behind deletion.
arXiv Detail & Related papers (2023-05-05T08:25:07Z) - Predicting Hate Intensity of Twitter Conversation Threads [26.190359413890537]
We propose DRAGNET++, which aims to predict the intensity of hatred that a tweet can bring in through its reply chain in the future.
It uses the semantic and propagating structure of the tweet threads to maximize the contextual information leading up to and the fall of hate intensity at each subsequent tweet.
We show that DRAGNET++ outperforms all the state-of-the-art baselines significantly.
arXiv Detail & Related papers (2022-06-16T18:51:36Z) - Twitter Dataset on the Russo-Ukrainian War [68.713984286035]
We have initiated an ongoing dataset acquisition from Twitter API.
The dataset has reached the amount of 57.3 million tweets, originating from 7.7 million users.
We apply an initial volume and sentiment analysis, while the dataset can be used to further exploratory investigation towards topic analysis, hate speech, propaganda recognition, or even show potential malicious entities like botnets.
arXiv Detail & Related papers (2022-04-07T12:33:06Z) - Manipulating Twitter Through Deletions [64.33261764633504]
Research into influence campaigns on Twitter has mostly relied on identifying malicious activities from tweets obtained via public APIs.
Here, we provide the first exhaustive, large-scale analysis of anomalous deletion patterns involving more than a billion deletions by over 11 million accounts.
We find that a small fraction of accounts delete a large number of tweets daily.
First, limits on tweet volume are circumvented, allowing certain accounts to flood the network with over 26 thousand daily tweets.
Second, coordinated networks of accounts engage in repetitive likes and unlikes of content that is eventually deleted, which can manipulate ranking algorithms.
arXiv Detail & Related papers (2022-03-25T20:07:08Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z) - Retweet communities reveal the main sources of hate speech [0.6999740786886536]
We deploy advanced deep learning to produce high quality hate speech classification models.
We create retweet networks, detect communities and monitor their evolution through time.
Hate speech is dominated by offensive tweets, related to political and ideological issues.
About 60% of unacceptable tweets are produced by a single right-wing community of only moderate size.
arXiv Detail & Related papers (2021-05-31T11:43:19Z) - "I Won the Election!": An Empirical Analysis of Soft Moderation
Interventions on Twitter [0.9391375268580806]
We study the users who share tweets with warning labels on Twitter and their political leaning.
We find that 72% of the tweets with warning labels are shared by Republicans, while only 11% are shared by Democrats.
arXiv Detail & Related papers (2021-01-18T17:39:58Z) - Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media
during the COVID-19 Crisis [51.39895377836919]
COVID-19 has sparked racism and hate on social media targeted towards Asian communities.
We study the evolution and spread of anti-Asian hate speech through the lens of Twitter.
We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months.
arXiv Detail & Related papers (2020-05-25T21:58:09Z) - Whose Tweets are Surveilled for the Police: An Audit of Social-Media
Monitoring Tool via Log Files [69.02688684221265]
We obtained log files from the Corvallis (Oregon) Police Department's use of social media monitoring software called DigitalStakeout.
These log files include the results of proprietary searches by DigitalStakeout that were running over a period of 13 months and include 7240 social media posts.
We observe differences in the demographics of the users whose Tweets are flagged by DigitalStakeout compared to the demographics of the Twitter users in the region.
arXiv Detail & Related papers (2020-01-23T19:35:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.