Predicting gender of Brazilian names using deep learning
- URL: http://arxiv.org/abs/2106.10156v1
- Date: Fri, 18 Jun 2021 14:45:59 GMT
- Title: Predicting gender of Brazilian names using deep learning
- Authors: Rosana C. B. Rego, Ver\^onica M. L. Silva
- Abstract summary: Some machine learning algorithms can satisfactorily perform the prediction.
A dataset of Brazilian names is used to train and evaluate the models.
Some models accurately predict the gender in more than 90% of the cases.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting gender by the name is not a simple task. In many applications,
especially in the natural language processing (NLP) field, this task may be
necessary, mainly when considering foreign names. Some machine learning
algorithms can satisfactorily perform the prediction. In this paper, we
examined and implemented feedforward and recurrent deep neural network models,
such as MLP, RNN, GRU, CNN, and BiLSTM, to classify gender through the first
name. A dataset of Brazilian names is used to train and evaluate the models. We
analyzed the accuracy, recall, precision, and confusion matrix to measure the
models' performances. The results indicate that the gender prediction can be
performed from the feature extraction strategy looking at the names as a set of
strings. Some models accurately predict the gender in more than 90% of the
cases. The recurrent models overcome the feedforward models in this binary
classification problem.
Related papers
- Beyond Binary Gender Labels: Revealing Gender Biases in LLMs through Gender-Neutral Name Predictions [5.896505047270243]
We introduce an additional gender category, i.e., "neutral", to study and address potential gender biases in large language models.
We investigate the impact of adding birth years to enhance the accuracy of gender prediction.
arXiv Detail & Related papers (2024-07-07T05:59:09Z) - Multicultural Name Recognition For Previously Unseen Names [65.268245109828]
This paper attempts to improve recognition of person names, a diverse category that can grow any time someone is born or changes their name.
I look at names from 103 countries to compare how well the model performs on names from different cultures.
I find that a model with combined character and word input outperforms word-only models and may improve on accuracy compared to classical NER models.
arXiv Detail & Related papers (2024-01-23T17:58:38Z) - Improving Gender Fairness of Pre-Trained Language Models without
Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin.
We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z) - First the worst: Finding better gender translations during beam search [19.921216907778447]
We focus on gender bias resulting from systematic errors in grammatical gender translation.
We experiment with reranking nbest lists using gender features obtained automatically from the source sentence.
We find that a combination of these techniques allows large gains in WinoMT accuracy without requiring additional bilingual data or an additional NMT model.
arXiv Detail & Related papers (2021-04-15T12:53:30Z) - What's in a Name? -- Gender Classification of Names with Character Based
Machine Learning Models [6.805167389805055]
We consider the problem of predicting the gender of registered users based on their declared name.
By analyzing the first names of 100M+ users, we found that genders can be very effectively classified using the composition of the name strings.
arXiv Detail & Related papers (2021-02-07T01:01:32Z) - The Gap on GAP: Tackling the Problem of Differing Data Distributions in
Bias-Measuring Datasets [58.53269361115974]
Diagnostic datasets that can detect biased models are an important prerequisite for bias reduction within natural language processing.
undesired patterns in the collected data can make such tests incorrect.
We introduce a theoretically grounded method for weighting test samples to cope with such patterns in the test data.
arXiv Detail & Related papers (2020-11-03T16:50:13Z) - Gender Prediction Based on Vietnamese Names with Machine Learning
Techniques [2.7528170226206443]
We propose a new dataset for gender prediction based on Vietnamese names.
This dataset comprises over 26,000 full names annotated with genders.
This paper describes six machine learning algorithms and a deep learning model (LSTM) with fastText word embedding for gender prediction on Vietnamese names.
arXiv Detail & Related papers (2020-10-21T09:25:48Z) - Investigating Gender Bias in BERT [22.066477991442003]
We analyse the gender-bias it induces in five downstream tasks related to emotion and sentiment intensity prediction.
We propose an algorithm that finds fine-grained gender directions, i.e., one primary direction for each BERT layer.
Experiments show that removing embedding components in such directions achieves great success in reducing BERT-induced bias in the downstream tasks.
arXiv Detail & Related papers (2020-09-10T17:38:32Z) - Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women.
We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.