For the Misgendered Chinese in Gender Bias Research: Multi-Task Learning with Knowledge Distillation for Pinyin Name-Gender Prediction
- URL: http://arxiv.org/abs/2405.06221v1
- Date: Fri, 10 May 2024 03:16:07 GMT
- Title: For the Misgendered Chinese in Gender Bias Research: Multi-Task Learning with Knowledge Distillation for Pinyin Name-Gender Prediction
- Authors: Xiaocong Du, Haipeng Zhang,
- Abstract summary: We formulate the Pinyin name-gender guessing problem and design a Multi-Task Learning Network assisted by Knowledge Distillation.
Our open-sourced method surpasses commercial name-gender guessing tools by 9.70% to 20.08% relatively, and also outperforms the state-of-the-art algorithms.
- Score: 8.287754685560815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Achieving gender equality is a pivotal factor in realizing the UN's Global Goals for Sustainable Development. Gender bias studies work towards this and rely on name-based gender inference tools to assign individual gender labels when gender information is unavailable. However, these tools often inaccurately predict gender for Chinese Pinyin names, leading to potential bias in such studies. With the growing participation of Chinese in international activities, this situation is becoming more severe. Specifically, current tools focus on pronunciation (Pinyin) information, neglecting the fact that the latent connections between Pinyin and Chinese characters (Hanzi) behind convey critical information. As a first effort, we formulate the Pinyin name-gender guessing problem and design a Multi-Task Learning Network assisted by Knowledge Distillation that enables the Pinyin embeddings in the model to possess semantic features of Chinese characters and to learn gender information from Chinese character names. Our open-sourced method surpasses commercial name-gender guessing tools by 9.70\% to 20.08\% relatively, and also outperforms the state-of-the-art algorithms.
Related papers
- Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - Gendec: A Machine Learning-based Framework for Gender Detection from
Japanese Names [0.0]
This work presents a novel dataset for Japanese name gender detection comprising 64,139 full names in romaji, hiragana, and kanji forms, along with their biological genders.
We propose Gendec, a framework for gender detection from Japanese names that leverages diverse approaches, including traditional machine learning techniques or cutting-edge transfer learning models.
arXiv Detail & Related papers (2023-11-18T07:46:59Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Gender, names and other mysteries: Towards the ambiguous for
gender-inclusive translation [7.322734499960981]
This paper explores the case where the source sentence lacks explicit gender markers, but the target sentence contains them due to richer grammatical gender.
We find that many name-gender co-occurrences in MT data are not resolvable with 'unambiguous gender' in the source language.
We discuss potential steps toward gender-inclusive translation which accepts the ambiguity in both gender and translation.
arXiv Detail & Related papers (2023-06-07T16:21:59Z) - For the Underrepresented in Gender Bias Research: Chinese Name Gender
Prediction with Heterogeneous Graph Attention Network [1.13608321568471]
We design a Chinese Heterogeneous Graph Attention (CHGAT) model to capture the heterogeneity in component relationships and incorporate the pronunciations of characters.
Our model largely surpasses current tools and also outperforms the state-of-the-art algorithm.
We open-source a more balanced multi-character dataset from an official source together with our code, hoping to help future research promoting gender equality.
arXiv Detail & Related papers (2023-02-01T13:08:50Z) - Towards Understanding Gender-Seniority Compound Bias in Natural Language
Generation [64.65911758042914]
We investigate how seniority impacts the degree of gender bias exhibited in pretrained neural generation models.
Our results show that GPT-2 amplifies bias by considering women as junior and men as senior more often than the ground truth in both domains.
These results suggest that NLP applications built using GPT-2 may harm women in professional capacities.
arXiv Detail & Related papers (2022-05-19T20:05:02Z) - Gender Bias Hidden Behind Chinese Word Embeddings: The Case of Chinese
Adjectives [0.0]
This paper investigates gender bias in static word embeddings from a unique perspective, Chinese adjectives.
Through a comparison between the produced results and a human-scored data set, we demonstrate how gender bias encoded in word embeddings differentiates from people's attitudes.
arXiv Detail & Related papers (2021-06-01T02:12:45Z) - They, Them, Theirs: Rewriting with Gender-Neutral English [56.14842450974887]
We perform a case study on the singular they, a common way to promote gender inclusion in English.
We show how a model can be trained to produce gender-neutral English with 1% word error rate with no human-labeled data.
arXiv Detail & Related papers (2021-02-12T21:47:48Z) - Gender Prediction Based on Vietnamese Names with Machine Learning
Techniques [2.7528170226206443]
We propose a new dataset for gender prediction based on Vietnamese names.
This dataset comprises over 26,000 full names annotated with genders.
This paper describes six machine learning algorithms and a deep learning model (LSTM) with fastText word embedding for gender prediction on Vietnamese names.
arXiv Detail & Related papers (2020-10-21T09:25:48Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.