Related papers: Evaluating Morphological Compositional Generalization in Large Language Models

Evaluating Morphological Compositional Generalization in Large Language Models

URL: http://arxiv.org/abs/2410.12656v2
Date: Wed, 06 Nov 2024 14:14:58 GMT
Title: Evaluating Morphological Compositional Generalization in Large Language Models
Authors: Mete Ismayilzada, Defne Circi, Jonne Sälevä, Hale Sirin, Abdullatif Köksal, Bhuwan Dhingra, Antoine Bosselut, Lonneke van der Plas, Duygu Ataman,
Abstract summary: We investigate the morphological generalization abilities of large language models (LLMs) through the lens of compositionality. We focus on agglutinative languages such as Turkish and Finnish. Our analysis shows that LLMs struggle with morphological compositional generalization particularly when applied to novel word roots. While models can identify individual morphological combinations better than chance, their performance lacks systematicity, leading to significant accuracy gaps compared to humans.
Score: 17.507983593566223
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have demonstrated significant progress in various natural language generation and understanding tasks. However, their linguistic generalization capabilities remain questionable, raising doubts about whether these models learn language similarly to humans. While humans exhibit compositional generalization and linguistic creativity in language use, the extent to which LLMs replicate these abilities, particularly in morphology, is under-explored. In this work, we systematically investigate the morphological generalization abilities of LLMs through the lens of compositionality. We define morphemes as compositional primitives and design a novel suite of generative and discriminative tasks to assess morphological productivity and systematicity. Focusing on agglutinative languages such as Turkish and Finnish, we evaluate several state-of-the-art instruction-finetuned multilingual models, including GPT-4 and Gemini. Our analysis shows that LLMs struggle with morphological compositional generalization particularly when applied to novel word roots, with performance declining sharply as morphological complexity increases. While models can identify individual morphological combinations better than chance, their performance lacks systematicity, leading to significant accuracy gaps compared to humans.

Related papers

IMPACT: Inflectional Morphology Probes Across Complex Typologies [0.0]
IMPACT is a synthetically generated evaluation framework focused on inflectional morphology.<n>It is designed to evaluate performance across five morphologically rich languages: Arabic, Russian, Finnish, Turkish, and Hebrew.<n>We assess eight multilingual LLMs that, despite strong English performance, struggle with other languages and uncommon morphological patterns.
arXiv Detail & Related papers (2025-06-30T14:58:23Z)
The Emergence of Abstract Thought in Large Language Models Beyond Any Language [95.50197866832772]
Large language models (LLMs) function effectively across a diverse range of languages.<n>Preliminary studies observe that the hidden activations of LLMs often resemble English, even when responding to non-English prompts.<n>Recent results show strong multilingual performance, even surpassing English performance on specific tasks in other languages.
arXiv Detail & Related papers (2025-06-11T16:00:54Z)
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition [50.86415025650168]
Masked image modeling (MIM) tends to exploit local structures to reconstruct visual patterns, resulting in limited linguistic knowledge. We propose a Linguistics-aware Masked Image Modeling (LMIM) approach, which channels the linguistic information into the decoding process of MIM through a separate branch.
arXiv Detail & Related papers (2025-03-24T14:53:35Z)
Can Language Models Learn Typologically Implausible Languages? [62.823015163987996]
Grammatical features across human languages show intriguing correlations often attributed to learning biases in humans. We discuss how language models (LMs) allow us to better determine the role of domain-general learning biases in language universals. We test LMs on an array of highly naturalistic but counterfactual versions of the English (head-initial) and Japanese (head-final) languages.
arXiv Detail & Related papers (2025-02-17T20:40:01Z)
Benchmarking Linguistic Diversity of Large Language Models [14.824871604671467]
This paper emphasizes the importance of examining the preservation of human linguistic richness by language models. We propose a comprehensive framework for evaluating LLMs from various linguistic diversity perspectives.
arXiv Detail & Related papers (2024-12-13T16:46:03Z)
Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning [49.60849499134362]
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning) Traditional psycholinguistic evaluations often reflect statistical biases that may misrepresent LLMs' true linguistic capabilities. We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers.
arXiv Detail & Related papers (2024-11-12T04:16:44Z)
Analyzing The Language of Visual Tokens [48.62180485759458]
We take a natural-language-centric approach to analyzing discrete visual languages. We show that higher token innovation drives greater entropy and lower compression, with tokens predominantly representing object parts. We also show that visual languages lack cohesive grammatical structures, leading to higher perplexity and weaker hierarchical organization compared to natural languages.
arXiv Detail & Related papers (2024-11-07T18:59:28Z)
Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence. We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena. As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z)
LinguAlchemy: Fusing Typological and Geographical Elements for Unseen Language Generalization [35.12566667582262]
LinguAlchemy is a regularization method that incorporates various linguistic information covering typological, geographical, and phylogenetic features. Our LinguAlchemy significantly improves the performance of mBERT and XLM-R on low-resource languages.
arXiv Detail & Related papers (2024-01-11T16:48:00Z)
Explicit Morphological Knowledge Improves Pre-training of Language Models for Hebrew [19.4968960182412]
We investigate the hypothesis that incorporating explicit morphological knowledge in the pre-training phase can improve the performance of PLMs for morphologically rich languages. We propose various morphologically driven tokenization methods enabling the model to leverage morphological cues beyond raw text. Our experiments show that morphologically driven tokenization demonstrates improved results compared to a standard language-agnostic tokenization.
arXiv Detail & Related papers (2023-11-01T17:02:49Z)
A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z)
Cross-Lingual Transfer of Cognitive Processing Complexity [11.939409227407769]
We use sentence-level eye-tracking patterns as a cognitive indicator for structural complexity. We show that the multilingual model XLM-RoBERTa can successfully predict varied patterns for 13 typologically diverse languages.
arXiv Detail & Related papers (2023-02-24T15:48:23Z)
Language Embeddings Sometimes Contain Typological Generalizations [0.0]
We train neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1295 languages. The learned language representations are then compared to existing typological databases as well as to a novel set of quantitative syntactic and morphological features. We conclude that some generalizations are surprisingly close to traditional features from linguistic typology, but that most models, as well as those of previous work, do not appear to have made linguistically meaningful generalizations.
arXiv Detail & Related papers (2023-01-19T15:09:59Z)
Morphology Matters: A Multilingual Language Modeling Analysis [8.791030561752384]
Prior studies disagree on whether inflectional morphology makes languages harder to model. We compile a larger corpus of 145 Bible translations in 92 languages and a larger number of typological features. Several morphological measures are significantly associated with higher surprisal when LSTM models are trained with BPE-segmented data.
arXiv Detail & Related papers (2020-12-11T11:55:55Z)
Linguistic Typology Features from Text: Inferring the Sparse Features of World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers. We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z)
Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures. We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.