Gendec: A Machine Learning-based Framework for Gender Detection from
Japanese Names
- URL: http://arxiv.org/abs/2311.11001v1
- Date: Sat, 18 Nov 2023 07:46:59 GMT
- Title: Gendec: A Machine Learning-based Framework for Gender Detection from
Japanese Names
- Authors: Duong Tien Pham and Luan Thanh Nguyen
- Abstract summary: This work presents a novel dataset for Japanese name gender detection comprising 64,139 full names in romaji, hiragana, and kanji forms, along with their biological genders.
We propose Gendec, a framework for gender detection from Japanese names that leverages diverse approaches, including traditional machine learning techniques or cutting-edge transfer learning models.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Every human has their own name, a fundamental aspect of their identity and
cultural heritage. The name often conveys a wealth of information, including
details about an individual's background, ethnicity, and, especially, their
gender. By detecting gender through the analysis of names, researchers can
unlock valuable insights into linguistic patterns and cultural norms, which can
be applied to practical applications. Hence, this work presents a novel dataset
for Japanese name gender detection comprising 64,139 full names in romaji,
hiragana, and kanji forms, along with their biological genders. Moreover, we
propose Gendec, a framework for gender detection from Japanese names that
leverages diverse approaches, including traditional machine learning techniques
or cutting-edge transfer learning models, to predict the gender associated with
Japanese names accurately. Through a thorough investigation, the proposed
framework is expected to be effective and serve potential applications in
various domains.
Related papers
- On the Influence of Gender and Race in Romantic Relationship Prediction from Large Language Models [21.178861746240507]
We study the presence of heteronormative biases and prejudice against interracial romantic relationships in large language models.
We show that models are less likely to predict romantic relationships for (a) same-gender character pairs than different-gender pairs; and (b) intra/inter-racial character pairs involving Asian names as compared to Black, Hispanic, or White names.
arXiv Detail & Related papers (2024-10-05T01:41:55Z) - What an Elegant Bridge: Multilingual LLMs are Biased Similarly in Different Languages [51.0349882045866]
This paper investigates biases of Large Language Models (LLMs) through the lens of grammatical gender.
We prompt a model to describe nouns with adjectives in various languages, focusing specifically on languages with grammatical gender.
We find that a simple classifier can not only predict noun gender above chance but also exhibit cross-language transferability.
arXiv Detail & Related papers (2024-07-12T22:10:16Z) - For the Misgendered Chinese in Gender Bias Research: Multi-Task Learning with Knowledge Distillation for Pinyin Name-Gender Prediction [8.287754685560815]
We formulate the Pinyin name-gender guessing problem and design a Multi-Task Learning Network assisted by Knowledge Distillation.
Our open-sourced method surpasses commercial name-gender guessing tools by 9.70% to 20.08% relatively, and also outperforms the state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-10T03:16:07Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - For the Underrepresented in Gender Bias Research: Chinese Name Gender
Prediction with Heterogeneous Graph Attention Network [1.13608321568471]
We design a Chinese Heterogeneous Graph Attention (CHGAT) model to capture the heterogeneity in component relationships and incorporate the pronunciations of characters.
Our model largely surpasses current tools and also outperforms the state-of-the-art algorithm.
We open-source a more balanced multi-character dataset from an official source together with our code, hoping to help future research promoting gender equality.
arXiv Detail & Related papers (2023-02-01T13:08:50Z) - Analyzing Gender Representation in Multilingual Models [59.21915055702203]
We focus on the representation of gender distinctions as a practical case study.
We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
arXiv Detail & Related papers (2022-04-20T00:13:01Z) - Quantifying Gender Bias Towards Politicians in Cross-Lingual Language
Models [104.41668491794974]
We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender.
We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians.
arXiv Detail & Related papers (2021-04-15T15:03:26Z) - They, Them, Theirs: Rewriting with Gender-Neutral English [56.14842450974887]
We perform a case study on the singular they, a common way to promote gender inclusion in English.
We show how a model can be trained to produce gender-neutral English with 1% word error rate with no human-labeled data.
arXiv Detail & Related papers (2021-02-12T21:47:48Z) - Gender Prediction Based on Vietnamese Names with Machine Learning
Techniques [2.7528170226206443]
We propose a new dataset for gender prediction based on Vietnamese names.
This dataset comprises over 26,000 full names annotated with genders.
This paper describes six machine learning algorithms and a deep learning model (LSTM) with fastText word embedding for gender prediction on Vietnamese names.
arXiv Detail & Related papers (2020-10-21T09:25:48Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.