Identifying Distributional Perspective Differences from Colingual Groups
- URL: http://arxiv.org/abs/2004.04938v2
- Date: Mon, 12 Apr 2021 19:11:33 GMT
- Title: Identifying Distributional Perspective Differences from Colingual Groups
- Authors: Yufei Tian, Tuhin Chakrabarty, Fred Morstatter and Nanyun Peng
- Abstract summary: A lack of mutual understanding among different groups about their perspectives on specific values or events may lead to uninformed decisions or biased opinions.
We study colingual groups and use language corpora as a proxy to identify their distributional perspectives.
We present a novel computational approach to learn shared understandings, and benchmark our method by building culturally-aware models for the English, Chinese, and Japanese languages.
- Score: 41.58939666949895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Perspective differences exist among different cultures or languages. A lack
of mutual understanding among different groups about their perspectives on
specific values or events may lead to uninformed decisions or biased opinions.
Automatically understanding the group perspectives can provide essential
background for many downstream applications of natural language processing
techniques. In this paper, we study colingual groups and use language corpora
as a proxy to identify their distributional perspectives. We present a novel
computational approach to learn shared understandings, and benchmark our method
by building culturally-aware models for the English, Chinese, and Japanese
languages. On a held out set of diverse topics including marriage, corruption,
democracy, our model achieves high correlation with human judgements regarding
intra-group values and inter-group differences.
Related papers
- Toward Cultural Interpretability: A Linguistic Anthropological Framework for Describing and Evaluating Large Language Models (LLMs) [13.71024600466761]
This article proposes a new integration of linguistic anthropology and machine learning (ML)
We show the theoretical feasibility of a new, conjoint field of inquiry, cultural interpretability (CI)
CI emphasizes how the dynamic relationship between language and culture makes contextually sensitive, open-ended conversation possible.
arXiv Detail & Related papers (2024-11-07T22:01:50Z) - Speech Analysis of Language Varieties in Italy [18.464078978885812]
We focus on automatically identifying the geographic region of origin of speech samples drawn from Italy's diverse language varieties.
We also seek to uncover new insights into the relationships among these diverse yet closely related varieties.
arXiv Detail & Related papers (2024-06-22T14:19:51Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Investigating Cultural Alignment of Large Language Models [10.738300803676655]
We show that Large Language Models (LLMs) genuinely encapsulate the diverse knowledge adopted by different cultures.
We quantify cultural alignment by simulating sociological surveys, comparing model responses to those of actual survey participants as references.
We introduce Anthropological Prompting, a novel method leveraging anthropological reasoning to enhance cultural alignment.
arXiv Detail & Related papers (2024-02-20T18:47:28Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - Analyzing Gender Representation in Multilingual Models [59.21915055702203]
We focus on the representation of gender distinctions as a practical case study.
We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
arXiv Detail & Related papers (2022-04-20T00:13:01Z) - Assessing Multilingual Fairness in Pre-trained Multimodal
Representations [8.730027941735804]
We argue that pre-trained vision-and-language representations are individually fair across languages but not guaranteed to group fairness.
We conduct experiments to explore the prevalent group disparity across languages and protected groups including race, gender and age.
arXiv Detail & Related papers (2021-06-12T03:57:05Z) - Deception detection in text and its relation to the cultural dimension
of individualism/collectivism [6.17866386107486]
We investigate if differences in the usage of specific linguistic features of deception across cultures can be confirmed and attributed to norms in respect to the individualism/collectivism divide.
We create culture/language-aware classifiers by experimenting with a wide range of n-gram features based on phonology, morphology and syntax.
We conducted our experiments over 11 datasets from 5 languages i.e., English, Dutch, Russian, Spanish and Romanian, from six countries (US, Belgium, India, Russia, Mexico and Romania)
arXiv Detail & Related papers (2021-05-26T13:09:47Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.