A study of conceptual language similarity: comparison and evaluation
- URL: http://arxiv.org/abs/2305.13401v1
- Date: Mon, 22 May 2023 18:28:02 GMT
- Title: A study of conceptual language similarity: comparison and evaluation
- Authors: Haotian Ye, Yihong Liu, Hinrich Sch\"utze
- Abstract summary: An interesting line of research in natural language processing (NLP) aims to incorporate linguistic typology to bridge linguistic diversity.
Recent work has introduced a novel approach to defining language similarity based on how they represent basic concepts.
In this work, we study the conceptual similarity in detail and evaluate it extensively on a binary classification task.
- Score: 0.3093890460224435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An interesting line of research in natural language processing (NLP) aims to
incorporate linguistic typology to bridge linguistic diversity and assist the
research of low-resource languages. While most works construct linguistic
similarity measures based on lexical or typological features, such as word
order and verbal inflection, recent work has introduced a novel approach to
defining language similarity based on how they represent basic concepts, which
is complementary to existing similarity measures. In this work, we study the
conceptual similarity in detail and evaluate it extensively on a binary
classification task.
Related papers
- Analyzing The Language of Visual Tokens [48.62180485759458]
We take a natural-language-centric approach to analyzing discrete visual languages.
We show that higher token innovation drives greater entropy and lower compression, with tokens predominantly representing object parts.
We also show that visual languages lack cohesive grammatical structures, leading to higher perplexity and weaker hierarchical organization compared to natural languages.
arXiv Detail & Related papers (2024-11-07T18:59:28Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - The Better Your Syntax, the Better Your Semantics? Probing Pretrained
Language Models for the English Comparative Correlative [7.03497683558609]
Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasising the connection between syntax and semantics.
We present an investigation of their capability to classify and understand one of the most commonly studied constructions, the English comparative correlative (CC)
Our results show that all three investigated PLMs are able to recognise the structure of the CC but fail to use its meaning.
arXiv Detail & Related papers (2022-10-24T13:01:24Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - A Neural Network-Based Linguistic Similarity Measure for Entrainment in
Conversations [12.052672647509732]
Linguistic entrainment is a phenomenon where people tend to mimic each other in conversation.
Most of the current similarity measures are based on bag-of-words approaches.
We propose to use a neural network model to perform the similarity measure for entrainment.
arXiv Detail & Related papers (2021-09-04T19:48:17Z) - Cross-Cultural Similarity Features for Cross-Lingual Transfer Learning
of Pragmatically Motivated Tasks [30.580822082075475]
We introduce three linguistic features that capture cross-cultural similarities that manifest in linguistic patterns and quantify distinct aspects of language pragmatics.
Our analyses show that the proposed pragmatic features do capture cross-cultural similarities and align well with existing work in sociolinguistics and linguistic anthropology.
arXiv Detail & Related papers (2020-06-16T17:20:25Z) - The Typology of Polysemy: A Multilingual Distributional Framework [6.753781783859273]
We present a novel framework that quantifies semantic affinity, the cross-linguistic similarity of lexical semantics for a concept.
Our results reveal an intricate interaction between semantic domains and extra-linguistic factors, beyond language phylogeny.
arXiv Detail & Related papers (2020-06-02T22:31:40Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures.
We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.