What's in a Name? -- Gender Classification of Names with Character Based
Machine Learning Models
- URL: http://arxiv.org/abs/2102.03692v1
- Date: Sun, 7 Feb 2021 01:01:32 GMT
- Title: What's in a Name? -- Gender Classification of Names with Character Based
Machine Learning Models
- Authors: Yifan Hu, Changwei Hu, Thanh Tran, Tejaswi Kasturi, Elizabeth Joseph,
Matt Gillingham
- Abstract summary: We consider the problem of predicting the gender of registered users based on their declared name.
By analyzing the first names of 100M+ users, we found that genders can be very effectively classified using the composition of the name strings.
- Score: 6.805167389805055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gender information is no longer a mandatory input when registering for an
account at many leading Internet companies. However, prediction of demographic
information such as gender and age remains an important task, especially in
intervention of unintentional gender/age bias in recommender systems. Therefore
it is necessary to infer the gender of those users who did not to provide this
information during registration. We consider the problem of predicting the
gender of registered users based on their declared name. By analyzing the first
names of 100M+ users, we found that genders can be very effectively classified
using the composition of the name strings. We propose a number of character
based machine learning models, and demonstrate that our models are able to
infer the gender of users with much higher accuracy than baseline models.
Moreover, we show that using the last names in addition to the first names
improves classification performance further.
Related papers
- Beyond Binary Gender Labels: Revealing Gender Biases in LLMs through Gender-Neutral Name Predictions [5.896505047270243]
We introduce an additional gender category, i.e., "neutral", to study and address potential gender biases in large language models.
We investigate the impact of adding birth years to enhance the accuracy of gender prediction.
arXiv Detail & Related papers (2024-07-07T05:59:09Z) - Gender inference: can chatGPT outperform common commercial tools? [0.0]
We compare the performance of a generative Artificial Intelligence (AI) tool ChatGPT with three commercially available list-based and machine learning-based gender inference tools.
Specifically, we use a large Olympic athlete dataset and report how variations in the input (e.g., first name and first and last name) impact the accuracy of their predictions.
ChatGPT performs at least as well as Namsor and often outperforms it, especially for the female sample when country and/or last name information is available.
arXiv Detail & Related papers (2023-11-24T22:09:14Z) - Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender
Perturbation over Fairytale Texts [87.62403265382734]
Recent studies show that traditional fairytales are rife with harmful gender biases.
This work aims to assess learned biases of language models by evaluating their robustness against gender perturbations.
arXiv Detail & Related papers (2023-10-16T22:25:09Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Improving Gender Fairness of Pre-Trained Language Models without
Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin.
We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z) - VoxCeleb Enrichment for Age and Gender Recognition [12.520037579004883]
We provide speaker age labels and (an alternative) annotation of speaker gender in VoxCeleb datasets.
We demonstrate the use of this metadata by constructing age and gender recognition models.
We also compare the original VoxCeleb gender labels with our labels to identify records that might be mislabeled in the original VoxCeleb data.
arXiv Detail & Related papers (2021-09-28T06:18:57Z) - Predicting gender of Brazilian names using deep learning [0.0]
Some machine learning algorithms can satisfactorily perform the prediction.
A dataset of Brazilian names is used to train and evaluate the models.
Some models accurately predict the gender in more than 90% of the cases.
arXiv Detail & Related papers (2021-06-18T14:45:59Z) - Gender Prediction Based on Vietnamese Names with Machine Learning
Techniques [2.7528170226206443]
We propose a new dataset for gender prediction based on Vietnamese names.
This dataset comprises over 26,000 full names annotated with genders.
This paper describes six machine learning algorithms and a deep learning model (LSTM) with fastText word embedding for gender prediction on Vietnamese names.
arXiv Detail & Related papers (2020-10-21T09:25:48Z) - Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women.
We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z) - Towards Gender-Neutral Face Descriptors for Mitigating Bias in Face
Recognition [51.856693288834975]
State-of-the-art deep networks implicitly encode gender information while being trained for face recognition.
Gender is often viewed as an important attribute with respect to identifying faces.
We present a novel Adversarial Gender De-biasing algorithm (AGENDA)' to reduce the gender information present in face descriptors.
arXiv Detail & Related papers (2020-06-14T08:54:03Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.