An exploration of the encoding of grammatical gender in word embeddings
- URL: http://arxiv.org/abs/2008.01946v2
- Date: Tue, 3 Nov 2020 13:11:41 GMT
- Title: An exploration of the encoding of grammatical gender in word embeddings
- Authors: Hartger Veeman and Ali Basirat
- Abstract summary: The study of grammatical gender based on word embeddings can give insight into discussions on how grammatical genders are determined.
It is found that there is an overlap in how grammatical gender is encoded in Swedish, Danish, and Dutch embeddings.
- Score: 0.6461556265872973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The vector representation of words, known as word embeddings, has opened a
new research approach in linguistic studies. These representations can capture
different types of information about words. The grammatical gender of nouns is
a typical classification of nouns based on their formal and semantic
properties. The study of grammatical gender based on word embeddings can give
insight into discussions on how grammatical genders are determined. In this
study, we compare different sets of word embeddings according to the accuracy
of a neural classifier determining the grammatical gender of nouns. It is found
that there is an overlap in how grammatical gender is encoded in Swedish,
Danish, and Dutch embeddings. Our experimental results on the contextualized
embeddings pointed out that adding more contextual information to embeddings is
detrimental to the classifier's performance. We also observed that removing
morpho-syntactic features such as articles from the training corpora of
embeddings decreases the classification performance dramatically, indicating a
large portion of the information is encoded in the relationship between nouns
and articles.
Related papers
- What an Elegant Bridge: Multilingual LLMs are Biased Similarly in Different Languages [51.0349882045866]
This paper investigates biases of Large Language Models (LLMs) through the lens of grammatical gender.
We prompt a model to describe nouns with adjectives in various languages, focusing specifically on languages with grammatical gender.
We find that a simple classifier can not only predict noun gender above chance but also exhibit cross-language transferability.
arXiv Detail & Related papers (2024-07-12T22:10:16Z) - Investigating grammatical abstraction in language models using few-shot learning of novel noun gender [0.0]
We conduct a noun learning experiment to assess whether an LSTM and a decoder-only transformer can achieve human-like abstraction of grammatical gender in French.
We find that both language models effectively generalise novel noun gender from one to two learning examples and apply the learnt gender across agreement contexts.
While the generalisation behaviour of models suggests that they represent grammatical gender as an abstract category, like humans, further work is needed to explore the details.
arXiv Detail & Related papers (2024-03-15T14:25:59Z) - The Causal Influence of Grammatical Gender on Distributional Semantics [87.8027818528463]
How much meaning influences gender assignment across languages is an active area of research in linguistics and cognitive science.
We offer a novel, causal graphical model that jointly represents the interactions between a noun's grammatical gender, its meaning, and adjective choice.
When we control for the meaning of the noun, the relationship between grammatical gender and adjective choice is near zero and insignificant.
arXiv Detail & Related papers (2023-11-30T13:58:13Z) - Measuring Gender Bias in Word Embeddings of Gendered Languages Requires
Disentangling Grammatical Gender Signals [3.0349733976070015]
We demonstrate that word embeddings learn the association between a noun and its grammatical gender in grammatically gendered languages.
We show that disentangling grammatical gender signals from word embeddings may lead to improvement in semantic machine learning tasks.
arXiv Detail & Related papers (2022-06-03T17:11:00Z) - Analyzing Gender Representation in Multilingual Models [59.21915055702203]
We focus on the representation of gender distinctions as a practical case study.
We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
arXiv Detail & Related papers (2022-04-20T00:13:01Z) - Gender Bias Hidden Behind Chinese Word Embeddings: The Case of Chinese
Adjectives [0.0]
This paper investigates gender bias in static word embeddings from a unique perspective, Chinese adjectives.
Through a comparison between the produced results and a human-scored data set, we demonstrate how gender bias encoded in word embeddings differentiates from people's attitudes.
arXiv Detail & Related papers (2021-06-01T02:12:45Z) - Grammatical gender associations outweigh topical gender bias in
crosslinguistic word embeddings [0.0]
Crosslinguistic word embeddings reveal that topical gender bias interacts with, and is surpassed in magnitude by, the effect of grammatical gender associations.
This finding has implications for downstream applications such as machine translation.
arXiv Detail & Related papers (2020-05-18T16:39:16Z) - On the Relationships Between the Grammatical Genders of Inanimate Nouns
and Their Co-Occurring Adjectives and Verbs [57.015586483981885]
We use large-scale corpora in six different gendered languages.
We find statistically significant relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects, indirect objects, and as subjects.
arXiv Detail & Related papers (2020-05-03T22:49:44Z) - Predicting Declension Class from Form and Meaning [70.65971611552871]
Class membership is far from deterministic, but the phonological form of a noun and/or its meaning can often provide imperfect clues.
We operationalize this by measuring how much information, in bits, we can glean about declension class from knowing the form and/or meaning of nouns.
We find for two Indo-European languages (Czech and German) that form and meaning respectively share significant amounts of information with class.
arXiv Detail & Related papers (2020-05-01T21:48:48Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.