Lexical Diversity in Kinship Across Languages and Dialects
- URL: http://arxiv.org/abs/2308.13056v2
- Date: Thu, 26 Oct 2023 12:54:30 GMT
- Title: Lexical Diversity in Kinship Across Languages and Dialects
- Authors: Hadi Khalilia, G\'abor Bella, Abed Alhakim Freihat, Shandy Darma,
Fausto Giunchiglia
- Abstract summary: We introduce a method to enrich computational lexicons with content relating to linguistic diversity.
The method is verified through two large-scale case studies on kinship terminology.
- Score: 6.80465507148218
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Languages are known to describe the world in diverse ways. Across lexicons,
diversity is pervasive, appearing through phenomena such as lexical gaps and
untranslatability. However, in computational resources, such as multilingual
lexical databases, diversity is hardly ever represented. In this paper, we
introduce a method to enrich computational lexicons with content relating to
linguistic diversity. The method is verified through two large-scale case
studies on kinship terminology, a domain known to be diverse across languages
and cultures: one case study deals with seven Arabic dialects, while the other
one with three Indonesian languages. Our results, made available as browseable
and downloadable computational resources, extend prior linguistics research on
kinship terminology, and provide insight into the extent of diversity even
within linguistically and culturally close communities.
Related papers
- Crowdsourcing Lexical Diversity [7.569845058082537]
This paper proposes a novel crowdsourcing methodology for reducing bias in lexicons.
Crowd workers compare lexemes from two languages, focusing on domains rich in lexical diversity, such as kinship or food.
We validated our method by applying it to two case studies focused on food-related terminology.
arXiv Detail & Related papers (2024-10-30T15:45:09Z) - Multi-lingual and Multi-cultural Figurative Language Understanding [69.47641938200817]
Figurative language permeates human communication, but is relatively understudied in NLP.
We create a dataset for seven diverse languages associated with a variety of cultures: Hindi, Indonesian, Javanese, Kannada, Sundanese, Swahili and Yoruba.
Our dataset reveals that each language relies on cultural and regional concepts for figurative expressions, with the highest overlap between languages originating from the same region.
All languages exhibit a significant deficiency compared to English, with variations in performance reflecting the availability of pre-training and fine-tuning data.
arXiv Detail & Related papers (2023-05-25T15:30:31Z) - Representing Interlingual Meaning in Lexical Databases [5.654039329474587]
We show that existing lexical databases have structural limitations that result in a reduced expressivity on culturally-specific words.
In particular, the lexical meaning space of dominant languages, such as English, is represented more accurately while linguistically or culturally diverse languages are mapped in an approximate manner.
arXiv Detail & Related papers (2023-01-22T17:41:29Z) - Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of
Lexical Gaps in Kinship [4.970603969125883]
We capture the phenomenon of diversity through the notions of lexical gap and language-specific word.
We publish a lexico-semantic resource consisting of 198 domain concepts, 1,911 words, and 37,370 gaps covering 699 languages.
arXiv Detail & Related papers (2022-04-11T12:36:26Z) - Language Diversity: Visible to Humans, Exploitable by Machines [6.0111449785499484]
The UKC website lets users explore millions of individual words and their meanings.
The UKC LiveLanguage Catalogue provides access to the underlying lexical data in a computer-processable form.
arXiv Detail & Related papers (2022-03-09T14:04:16Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual
Lexical Semantic Similarity [67.36239720463657]
Multi-SimLex is a large-scale lexical resource and evaluation benchmark covering datasets for 12 diverse languages.
Each language dataset is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs.
Owing to the alignment of concepts across languages, we provide a suite of 66 cross-lingual semantic similarity datasets.
arXiv Detail & Related papers (2020-03-10T17:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.