Is Japanese gendered language used on Twitter ? A large scale study
- URL: http://arxiv.org/abs/2006.15935v2
- Date: Thu, 9 Jul 2020 08:59:17 GMT
- Title: Is Japanese gendered language used on Twitter ? A large scale study
- Authors: Tiziana Carpi and Stefano Maria Iacus
- Abstract summary: This study starts from a collection of 408 million Japanese tweets from 2015 till 2019 and an additional sample of 2355 manually classified Twitter accounts timelines into gender and categories (politicians, musicians, etc)
A large scale textual analysis is performed on this corpus to identify and examine sentence-final particles (SFPs) and first-person pronouns appearing in the texts.
It turns out that gendered language is in fact used also on Twitter, in about 6% of the tweets, and that the prescriptive classification into "male" and "female" language does not always meet the expectations, with remarkable exceptions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study analyzes the usage of Japanese gendered language on Twitter.
Starting from a collection of 408 million Japanese tweets from 2015 till 2019
and an additional sample of 2355 manually classified Twitter accounts timelines
into gender and categories (politicians, musicians, etc). A large scale textual
analysis is performed on this corpus to identify and examine sentence-final
particles (SFPs) and first-person pronouns appearing in the texts. It turns out
that gendered language is in fact used also on Twitter, in about 6% of the
tweets, and that the prescriptive classification into "male" and "female"
language does not always meet the expectations, with remarkable exceptions.
Further, SFPs and pronouns show increasing or decreasing trends, indicating an
evolution of the language used on Twitter.
Related papers
- 'Since Lawyers are Males..': Examining Implicit Gender Bias in Hindi Language Generation by LLMs [4.021517742561241]
This study explores implicit gender biases in Hindi text generation and compares them to those in English.
Our results reveal a significant gender bias of 87.8% in Hindi, compared to 33.4% in English GPT-4o generation.
This research underscores the variation in gender biases across languages and provides considerations for navigating these biases in generative AI systems.
arXiv Detail & Related papers (2024-09-20T13:16:58Z) - What an Elegant Bridge: Multilingual LLMs are Biased Similarly in Different Languages [51.0349882045866]
This paper investigates biases of Large Language Models (LLMs) through the lens of grammatical gender.
We prompt a model to describe nouns with adjectives in various languages, focusing specifically on languages with grammatical gender.
We find that a simple classifier can not only predict noun gender above chance but also exhibit cross-language transferability.
arXiv Detail & Related papers (2024-07-12T22:10:16Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - ArabGend: Gender Analysis and Inference on Arabic Twitter [8.373984536015842]
We perform an extensive analysis of differences between male and female users on the Arabic Twitter-sphere.
Along with gender analysis, we also propose a method to infer gender by utilizing usernames, profile pictures, tweets, and networks of friends.
Our proposed gender inference method achieve an F1 score of 82.1%, which is 47.3% higher than majority baseline.
arXiv Detail & Related papers (2022-03-01T07:13:09Z) - The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses [17.253633576291897]
We introduce a new corpus for gender identification and rewriting in contexts involving one or two target users.
We focus on Arabic, a gender-marking morphologically rich language.
arXiv Detail & Related papers (2021-10-18T12:06:17Z) - 2020 U.S. Presidential Election: Analysis of Female and Male Users on
Twitter [8.651122862855495]
Current literature mainly focuses on analyzing the content of tweets without considering the gender of users.
This research collects and analyzes a large number of tweets posted during the 2020 U.S. presidential election.
Our findings are based upon a wide range of topics, such as tax, climate change, and the COVID-19 pandemic.
arXiv Detail & Related papers (2021-08-21T01:31:03Z) - Quantifying Gender Bias Towards Politicians in Cross-Lingual Language
Models [104.41668491794974]
We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender.
We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians.
arXiv Detail & Related papers (2021-04-15T15:03:26Z) - Understanding the Hoarding Behaviors during the COVID-19 Pandemic using
Large Scale Social Media Data [77.34726150561087]
We analyze the hoarding and anti-hoarding patterns of over 42,000 unique Twitter users in the United States from March 1 to April 30, 2020.
We find the percentage of females in both hoarding and anti-hoarding groups is higher than that of the general Twitter user population.
The LIWC anxiety mean for the hoarding-related tweets is significantly higher than the baseline Twitter anxiety mean.
arXiv Detail & Related papers (2020-10-15T16:02:25Z) - Improving Sentiment Analysis over non-English Tweets using Multilingual
Transformers and Automatic Translation for Data-Augmentation [77.69102711230248]
We propose the use of a multilingual transformer model, that we pre-train over English tweets and apply data-augmentation using automatic translation to adapt the model to non-English languages.
Our experiments in French, Spanish, German and Italian suggest that the proposed technique is an efficient way to improve the results of the transformers over small corpora of tweets in a non-English language.
arXiv Detail & Related papers (2020-10-07T15:44:55Z) - LEBANONUPRISING: a thorough study of Lebanese tweets [0.0]
On October 17, Lebanon witnessed the start of a revolution; the LebanonUprising hashtag became viral on Twitter.
A dataset consisting of a 100,0000 tweets was collected between 18 and 21 October.
We conducted a sentiment analysis study for the tweets in spoken Lebanese Arabic related to the LebanonUprising hashtag using different machine learning algorithms.
arXiv Detail & Related papers (2020-09-30T05:50:08Z) - #MeToo on Campus: Studying College Sexual Assault at Scale Using Data
Reported on Social Media [71.74529365205053]
We analyze the influence of the # trend on a pool of college followers.
The results show that the majority of topics embedded in those # tweets detail sexual harassment stories.
There exists a significant correlation between the prevalence of this trend and official reports on several major geographical regions.
arXiv Detail & Related papers (2020-01-16T18:05:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.