For the Underrepresented in Gender Bias Research: Chinese Name Gender
Prediction with Heterogeneous Graph Attention Network
- URL: http://arxiv.org/abs/2302.00419v1
- Date: Wed, 1 Feb 2023 13:08:50 GMT
- Title: For the Underrepresented in Gender Bias Research: Chinese Name Gender
Prediction with Heterogeneous Graph Attention Network
- Authors: Zihao Pan, Kai Peng, Shuai Ling, Haipeng Zhang
- Abstract summary: We design a Chinese Heterogeneous Graph Attention (CHGAT) model to capture the heterogeneity in component relationships and incorporate the pronunciations of characters.
Our model largely surpasses current tools and also outperforms the state-of-the-art algorithm.
We open-source a more balanced multi-character dataset from an official source together with our code, hoping to help future research promoting gender equality.
- Score: 1.13608321568471
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Achieving gender equality is an important pillar for humankind's sustainable
future. Pioneering data-driven gender bias research is based on large-scale
public records such as scientific papers, patents, and company registrations,
covering female researchers, inventors and entrepreneurs, and so on. Since
gender information is often missing in relevant datasets, studies rely on tools
to infer genders from names. However, available open-sourced Chinese
gender-guessing tools are not yet suitable for scientific purposes, which may
be partially responsible for female Chinese being underrepresented in
mainstream gender bias research and affect their universality. Specifically,
these tools focus on character-level information while overlooking the fact
that the combinations of Chinese characters in multi-character names, as well
as the components and pronunciations of characters, convey important messages.
As a first effort, we design a Chinese Heterogeneous Graph Attention (CHGAT)
model to capture the heterogeneity in component relationships and incorporate
the pronunciations of characters. Our model largely surpasses current tools and
also outperforms the state-of-the-art algorithm. Last but not least, the most
popular Chinese name-gender dataset is single-character based with far less
female coverage from an unreliable source, naturally hindering relevant
studies. We open-source a more balanced multi-character dataset from an
official source together with our code, hoping to help future research
promoting gender equality.
Related papers
- Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - For the Misgendered Chinese in Gender Bias Research: Multi-Task Learning with Knowledge Distillation for Pinyin Name-Gender Prediction [8.287754685560815]
We formulate the Pinyin name-gender guessing problem and design a Multi-Task Learning Network assisted by Knowledge Distillation.
Our open-sourced method surpasses commercial name-gender guessing tools by 9.70% to 20.08% relatively, and also outperforms the state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-10T03:16:07Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Gender inference: can chatGPT outperform common commercial tools? [0.0]
We compare the performance of a generative Artificial Intelligence (AI) tool ChatGPT with three commercially available list-based and machine learning-based gender inference tools.
Specifically, we use a large Olympic athlete dataset and report how variations in the input (e.g., first name and first and last name) impact the accuracy of their predictions.
ChatGPT performs at least as well as Namsor and often outperforms it, especially for the female sample when country and/or last name information is available.
arXiv Detail & Related papers (2023-11-24T22:09:14Z) - The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender
Characterisation in 55 Languages [51.2321117760104]
This paper describes the Gender-GAP Pipeline, an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages.
The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text.
We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.
arXiv Detail & Related papers (2023-08-31T17:20:50Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation [28.38578407487603]
We propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels.
We address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias.
CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.
arXiv Detail & Related papers (2023-01-01T12:48:12Z) - Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution
and Machine Translation [10.542861450223128]
We find grammatical patterns indicating stereotypical and non-stereotypical gender-role assignments in corpora from three domains.
We manually verify the quality of our corpus and use it to evaluate gender bias in various coreference resolution and machine translation models.
arXiv Detail & Related papers (2021-09-08T18:14:11Z) - Quantifying Gender Bias Towards Politicians in Cross-Lingual Language
Models [104.41668491794974]
We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender.
We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians.
arXiv Detail & Related papers (2021-04-15T15:03:26Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.