GeniL: A Multilingual Dataset on Generalizing Language
- URL: http://arxiv.org/abs/2404.05866v2
- Date: Fri, 9 Aug 2024 16:20:27 GMT
- Title: GeniL: A Multilingual Dataset on Generalizing Language
- Authors: Aida Mostafazadeh Davani, Sagar Gubbi, Sunipa Dev, Shachi Dave, Vinodkumar Prabhakaran,
- Abstract summary: Current methods to assess presence of stereotypes in generated language rely on simple template or co-occurrence based measures.
We argue that understanding the sentential context is crucial for detecting instances of generalization.
We build GeniL, a multilingual dataset of over 50K sentences from 9 languages annotated for instances of generalizations.
- Score: 19.43611224855484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative language models are transforming our digital ecosystem, but they often inherit societal biases, for instance stereotypes associating certain attributes with specific identity groups. While whether and how these biases are mitigated may depend on the specific use cases, being able to effectively detect instances of stereotype perpetuation is a crucial first step. Current methods to assess presence of stereotypes in generated language rely on simple template or co-occurrence based measures, without accounting for the variety of sentential contexts they manifest in. We argue that understanding the sentential context is crucial for detecting instances of generalization. We distinguish two types of generalizations: (1) language that merely mentions the presence of a generalization ("people think the French are very rude"), and (2) language that reinforces such a generalization ("as French they must be rude"), from non-generalizing context ("My French friends think I am rude"). For meaningful stereotype evaluations, we need to reliably distinguish such instances of generalizations. We introduce the new task of detecting generalization in language, and build GeniL, a multilingual dataset of over 50K sentences from 9 languages (English, Arabic, Bengali, Spanish, French, Hindi, Indonesian, Malay, and Portuguese) annotated for instances of generalizations. We demonstrate that the likelihood of a co-occurrence being an instance of generalization is usually low, and varies across different languages, identity groups, and attributes. We build classifiers to detect generalization in language with an overall PR-AUC of 58.7, with varying degrees of performance across languages. Our research provides data and tools to enable a nuanced understanding of stereotype perpetuation, a crucial step towards more inclusive and responsible language technologies.
Related papers
- Towards Generalized Offensive Language Identification [13.261770797304777]
This paper empirically evaluates the generalizability of offensive language detection models and datasets across a novel generalized benchmark.
Our findings will be useful in creating robust real-world offensive language detection systems.
arXiv Detail & Related papers (2024-07-26T13:50:22Z) - The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments [57.273662221547056]
In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance.
We observe that the existence of a predominant language during training boosts the performance of less frequent languages.
As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
arXiv Detail & Related papers (2024-04-11T17:58:05Z) - Quantifying Stereotypes in Language [6.697298321551588]
We quantify stereotypes in language by annotating a dataset.
We use the pre-trained language models (PLMs) to learn this dataset to predict stereotypes of sentences.
We discuss stereotypes about common social issues such as hate speech, sexism, sentiments, and disadvantaged and advantaged groups.
arXiv Detail & Related papers (2024-01-28T01:07:21Z) - Multilingual large language models leak human stereotypes across language boundaries [25.903732543380528]
We investigate how stereotypical associations leak across four languages: English, Russian, Chinese, and Hindi.
Hindi appears to be the most susceptible to influence from other languages, while Chinese is the least.
arXiv Detail & Related papers (2023-12-12T10:24:17Z) - Are Structural Concepts Universal in Transformer Language Models?
Towards Interpretable Cross-Lingual Generalization [27.368684663279463]
We investigate the potential for explicitly aligning conceptual correspondence between languages to enhance cross-lingual generalization.
Using the syntactic aspect of language as a testbed, our analyses of 43 languages reveal a high degree of alignability.
We propose a meta-learning-based method to learn to align conceptual spaces of different languages.
arXiv Detail & Related papers (2023-10-19T14:50:51Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - Penguins Don't Fly: Reasoning about Generics through Instantiations and
Exceptions [73.56753518339247]
We present a novel framework informed by linguistic theory to generate exemplars -- specific cases when a generic holds true or false.
We generate 19k exemplars for 650 generics and show that our framework outperforms a strong GPT-3 baseline by 12.8 precision points.
arXiv Detail & Related papers (2022-05-23T22:45:53Z) - Analyzing Gender Representation in Multilingual Models [59.21915055702203]
We focus on the representation of gender distinctions as a practical case study.
We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
arXiv Detail & Related papers (2022-04-20T00:13:01Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Lower Perplexity is Not Always Human-Like [25.187238589433385]
We re-examine an established generalization -- the lower perplexity a language model has, the more human-like the language model is -- in Japanese.
Our experiments demonstrate that this established generalization exhibits a surprising lack of universality.
Our results suggest that a cross-lingual evaluation will be necessary to construct human-like computational models.
arXiv Detail & Related papers (2021-06-02T15:27:29Z) - A Benchmark for Systematic Generalization in Grounded Language
Understanding [61.432407738682635]
Humans easily interpret expressions that describe unfamiliar situations composed from familiar parts.
Modern neural networks, by contrast, struggle to interpret novel compositions.
We introduce a new benchmark, gSCAN, for evaluating compositional generalization in situated language understanding.
arXiv Detail & Related papers (2020-03-11T08:40:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.