ArabGend: Gender Analysis and Inference on Arabic Twitter
- URL: http://arxiv.org/abs/2203.00271v1
- Date: Tue, 1 Mar 2022 07:13:09 GMT
- Title: ArabGend: Gender Analysis and Inference on Arabic Twitter
- Authors: Hamdy Mubarak, Shammur Absar Chowdhury, Firoj Alam
- Abstract summary: We perform an extensive analysis of differences between male and female users on the Arabic Twitter-sphere.
Along with gender analysis, we also propose a method to infer gender by utilizing usernames, profile pictures, tweets, and networks of friends.
Our proposed gender inference method achieve an F1 score of 82.1%, which is 47.3% higher than majority baseline.
- Score: 8.373984536015842
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Gender analysis of Twitter can reveal important socio-cultural differences
between male and female users. There has been a significant effort to analyze
and automatically infer gender in the past for most widely spoken languages'
content, however, to our knowledge very limited work has been done for Arabic.
In this paper, we perform an extensive analysis of differences between male and
female users on the Arabic Twitter-sphere. We study differences in user
engagement, topics of interest, and the gender gap in professions. Along with
gender analysis, we also propose a method to infer gender by utilizing
usernames, profile pictures, tweets, and networks of friends. In order to do
so, we manually annotated gender and locations for ~166K Twitter accounts
associated with ~92K user location, which we plan to make publicly available at
http://anonymous.com. Our proposed gender inference method achieve an F1 score
of 82.1%, which is 47.3% higher than majority baseline. In addition, we also
developed a demo and made it publicly available.
Related papers
- Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Gender inference: can chatGPT outperform common commercial tools? [0.0]
We compare the performance of a generative Artificial Intelligence (AI) tool ChatGPT with three commercially available list-based and machine learning-based gender inference tools.
Specifically, we use a large Olympic athlete dataset and report how variations in the input (e.g., first name and first and last name) impact the accuracy of their predictions.
ChatGPT performs at least as well as Namsor and often outperforms it, especially for the female sample when country and/or last name information is available.
arXiv Detail & Related papers (2023-11-24T22:09:14Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Voices of Her: Analyzing Gender Differences in the AI Publication World [26.702520904075044]
We identify several gender differences using the AI Scholar dataset of 78K researchers in the field of AI.
Female first-authored papers show distinct linguistic styles, such as longer text, more positive emotion words, and more catchy titles.
Our analysis provides a window into the current demographic trends in our AI community, and encourages more gender equality and diversity in the future.
arXiv Detail & Related papers (2023-05-24T00:40:49Z) - Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z) - The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses [17.253633576291897]
We introduce a new corpus for gender identification and rewriting in contexts involving one or two target users.
We focus on Arabic, a gender-marking morphologically rich language.
arXiv Detail & Related papers (2021-10-18T12:06:17Z) - 2020 U.S. Presidential Election: Analysis of Female and Male Users on
Twitter [8.651122862855495]
Current literature mainly focuses on analyzing the content of tweets without considering the gender of users.
This research collects and analyzes a large number of tweets posted during the 2020 U.S. presidential election.
Our findings are based upon a wide range of topics, such as tax, climate change, and the COVID-19 pandemic.
arXiv Detail & Related papers (2021-08-21T01:31:03Z) - Quantifying Gender Bias Towards Politicians in Cross-Lingual Language
Models [104.41668491794974]
We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender.
We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians.
arXiv Detail & Related papers (2021-04-15T15:03:26Z) - They, Them, Theirs: Rewriting with Gender-Neutral English [56.14842450974887]
We perform a case study on the singular they, a common way to promote gender inclusion in English.
We show how a model can be trained to produce gender-neutral English with 1% word error rate with no human-labeled data.
arXiv Detail & Related papers (2021-02-12T21:47:48Z) - Is Japanese gendered language used on Twitter ? A large scale study [0.0]
This study starts from a collection of 408 million Japanese tweets from 2015 till 2019 and an additional sample of 2355 manually classified Twitter accounts timelines into gender and categories (politicians, musicians, etc)
A large scale textual analysis is performed on this corpus to identify and examine sentence-final particles (SFPs) and first-person pronouns appearing in the texts.
It turns out that gendered language is in fact used also on Twitter, in about 6% of the tweets, and that the prescriptive classification into "male" and "female" language does not always meet the expectations, with remarkable exceptions.
arXiv Detail & Related papers (2020-06-29T11:07:10Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.