Grounding Toxicity in Real-World Events across Languages
- URL: http://arxiv.org/abs/2405.13754v1
- Date: Wed, 22 May 2024 15:38:53 GMT
- Title: Grounding Toxicity in Real-World Events across Languages
- Authors: Wondimagegnhue Tsegaye Tufa, Ilia Markov, Piek Vossen,
- Abstract summary: Events in the real world, like elections or conflicts, can initiate and escalate toxic behavior online.
We gathered Reddit data comprising 4.5 million comments from 31 thousand posts in six different languages.
We observe significant variations in toxicity, negative sentiment, and emotion expressions across different events and language communities.
- Score: 2.5398014196797605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social media conversations frequently suffer from toxicity, creating significant issues for users, moderators, and entire communities. Events in the real world, like elections or conflicts, can initiate and escalate toxic behavior online. Our study investigates how real-world events influence the origin and spread of toxicity in online discussions across various languages and regions. We gathered Reddit data comprising 4.5 million comments from 31 thousand posts in six different languages (Dutch, English, German, Arabic, Turkish and Spanish). We target fifteen major social and political world events that occurred between 2020 and 2023. We observe significant variations in toxicity, negative sentiment, and emotion expressions across different events and language communities, showing that toxicity is a complex phenomenon in which many different factors interact and still need to be investigated. We will release the data for further research along with our code.
Related papers
- Polarized Patterns of Language Toxicity and Sentiment of Debunking Posts on Social Media [5.301808480190602]
The rise of misinformation and fake news in online political discourse poses significant challenges to democratic processes and public engagement.
We examined over 86 million debunking tweets and more than 4 million Reddit debunking comments to investigate the relationship between language toxicity, pessimism, and social polarization in debunking efforts.
We show that platform architecture affects informational complexity of user interactions, with Twitter promoting concentrated, uniform discourse and Reddit encouraging diverse, complex communication.
arXiv Detail & Related papers (2025-01-10T08:00:58Z) - Multilingual and Explainable Text Detoxification with Parallel Corpora [58.83211571400692]
We extend parallel text detoxification corpus to new languages.
We conduct the first of its kind an automated, explainable analysis of the descriptive features of both toxic and non-toxic sentences.
We then experiment with a novel text detoxification method inspired by the Chain-of-Thoughts reasoning approach.
arXiv Detail & Related papers (2024-12-16T12:08:59Z) - Characterization of Political Polarized Users Attacked by Language Toxicity on Twitter [3.0367864044156088]
This study aims to provide a first exploration of the potential language toxicity flow among Left, Right and Center users.
More than 500M Twitter posts were examined.
It was discovered that Left users received much more toxic replies than Right and Center users.
arXiv Detail & Related papers (2024-07-17T10:49:47Z) - Tracking Patterns in Toxicity and Antisocial Behavior Over User Lifetimes on Large Social Media Platforms [0.2630859234884723]
We analyze toxicity over a 14-year time span on nearly 500 million comments from Reddit and Wikipedia.
We find that the most toxic behavior on Reddit exhibited in aggregate by the most active users, and the most toxic behavior on Wikipedia exhibited in aggregate by the least active users.
arXiv Detail & Related papers (2024-07-12T15:45:02Z) - The Constant in HATE: Analyzing Toxicity in Reddit across Topics and Languages [2.5398014196797605]
Toxic language remains an ongoing challenge on social media platforms.
This paper provides a cross-topic and cross-lingual analysis of toxicity in Reddit conversations.
arXiv Detail & Related papers (2024-04-29T14:14:33Z) - Comprehensive Assessment of Toxicity in ChatGPT [49.71090497696024]
We evaluate the toxicity in ChatGPT by utilizing instruction-tuning datasets.
prompts in creative writing tasks can be 2x more likely to elicit toxic responses.
Certain deliberately toxic prompts, designed in earlier studies, no longer yield harmful responses.
arXiv Detail & Related papers (2023-11-03T14:37:53Z) - Analyzing Norm Violations in Live-Stream Chat [49.120561596550395]
We study the first NLP study dedicated to detecting norm violations in conversations on live-streaming platforms.
We define norm violation categories in live-stream chats and annotate 4,583 moderated comments from Twitch.
Our results show that appropriate contextual information can boost moderation performance by 35%.
arXiv Detail & Related papers (2023-05-18T05:58:27Z) - User Engagement and the Toxicity of Tweets [1.1339580074756188]
We analyze a random sample of more than 85,300 Twitter conversations to examine differences between toxic and non-toxic conversations.
We find that toxic conversations, those with at least one toxic tweet, are longer but have fewer individual users contributing to the dialogue compared to the non-toxic conversations.
We also find a relationship between the toxicity of the first reply to a toxic tweet and the toxicity of the conversation.
arXiv Detail & Related papers (2022-11-07T20:55:22Z) - Annotators with Attitudes: How Annotator Beliefs And Identities Bias
Toxic Language Detection [75.54119209776894]
We investigate the effect of annotator identities (who) and beliefs (why) on toxic language annotations.
We consider posts with three characteristics: anti-Black language, African American English dialect, and vulgarity.
Our results show strong associations between annotator identity and beliefs and their ratings of toxicity.
arXiv Detail & Related papers (2021-11-15T18:58:20Z) - When a crisis strikes: Emotion analysis and detection during COVID-19 [96.03869351276478]
We present CovidEmo, 1K tweets labeled with emotions.
We examine how well large pre-trained language models generalize across domains and crises.
arXiv Detail & Related papers (2021-07-23T04:07:14Z) - Analyzing COVID-19 on Online Social Media: Trends, Sentiments and
Emotions [44.92240076313168]
We analyze the affective trajectories of the American people and the Chinese people based on Twitter and Weibo posts between January 20th, 2020 and May 11th 2020.
By contrasting two very different countries, China and the Unites States, we reveal sharp differences in people's views on COVID-19 in different cultures.
Our study provides a computational approach to unveiling public emotions and concerns on the pandemic in real-time, which would potentially help policy-makers better understand people's need and thus make optimal policy.
arXiv Detail & Related papers (2020-05-29T09:24:38Z) - Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media
during the COVID-19 Crisis [51.39895377836919]
COVID-19 has sparked racism and hate on social media targeted towards Asian communities.
We study the evolution and spread of anti-Asian hate speech through the lens of Twitter.
We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months.
arXiv Detail & Related papers (2020-05-25T21:58:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.