Related papers: Lost in Variation? Evaluating NLI Performance in Basque and Spanish Geographical Variants

Lost in Variation? Evaluating NLI Performance in Basque and Spanish Geographical Variants

URL: http://arxiv.org/abs/2506.15239v2
Date: Wed, 23 Jul 2025 13:37:11 GMT
Title: Lost in Variation? Evaluating NLI Performance in Basque and Spanish Geographical Variants
Authors: Jaione Bengoetxea, Itziar Gonzalez-Dios, Rodrigo Agerri,
Abstract summary: We evaluate the capacity of current language technologies to understand Basque and Spanish language varieties.<n>We use Natural Language Inference (NLI) as a pivot task and introduce a novel, manually-curated parallel dataset.
Score: 7.160574787275442
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In this paper, we evaluate the capacity of current language technologies to understand Basque and Spanish language varieties. We use Natural Language Inference (NLI) as a pivot task and introduce a novel, manually-curated parallel dataset in Basque and Spanish, along with their respective variants. Our empirical analysis of crosslingual and in-context learning experiments using encoder-only and decoder-based Large Language Models (LLMs) shows a performance drop when handling linguistic variation, especially in Basque. Error analysis suggests that this decline is not due to lexical overlap, but rather to the linguistic variation itself. Further ablation experiments indicate that encoder-only models particularly struggle with Western Basque, which aligns with linguistic theory that identifies peripheral dialects (e.g., Western) as more distant from the standard. All data and code are publicly available.

Related papers

Tokenization and Representation Biases in Multilingual Models on Dialectal NLP Tasks [7.216732751280017]
We correlate Tokenization Parity (TP) and Information Parity (IP) as measures of representational biases in pre-trained multilingual models.<n>We compare state-of-the-art decoder-only LLMs with encoder-based models across three tasks: dialect classification, topic classification, and extractive question answering.<n>Our analysis reveals that TP is a better predictor of the performance on tasks reliant on syntactic and morphological cues, while IP better predicts performance in semantic tasks.
arXiv Detail & Related papers (2025-09-24T12:13:53Z)
Text2Cypher Across Languages: Evaluating Foundational Models Beyond English [0.0]
This paper investigates the performance of foundational LLMs on the Text2Cypher task across multiple languages.<n>We create and release a multilingual test set by translating English questions into Spanish and Turkish while preserving the original Cypher queries.
arXiv Detail & Related papers (2025-06-26T16:31:10Z)
Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training [58.696660064190475]
We find that the existence of code-switching, alternating between different languages within a context, is key to multilingual capabilities.<n>To better explore the power of code-switching for language alignment during pre-training, we investigate the strategy of synthetic code-switching.
arXiv Detail & Related papers (2025-04-02T15:09:58Z)
Modeling Orthographic Variation in Occitan's Dialects [3.038642416291856]
Large multilingual models minimize the need for spelling normalization during pre-processing. Our findings suggest that large multilingual models minimize the need for spelling normalization during pre-processing.
arXiv Detail & Related papers (2024-04-30T07:33:51Z)
We're Calling an Intervention: Exploring Fundamental Hurdles in Adapting Language Models to Nonstandard Text [8.956635443376527]
We present a suite of experiments that allow us to understand the underlying challenges of language model adaptation to nonstandard text.<n>We do so by designing interventions that approximate core features of user-generated text and their interactions with existing biases of language models.<n>Applying our interventions during language model adaptation to nonstandard text variations, we gain important insights into when such adaptation is successful.
arXiv Detail & Related papers (2024-04-10T18:56:53Z)
Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models. We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z)
Evaluating Shortest Edit Script Methods for Contextual Lemmatization [6.0158981171030685]
Modern contextual lemmatizers often rely on automatically induced Shortest Edit Scripts (SES) to transform a word form into its lemma. Previous work has not investigated the direct impact of SES in the final lemmatization performance. We show that computing the casing and edit operations separately is beneficial overall, but much more clearly for languages with high-inflected morphology.
arXiv Detail & Related papers (2024-03-25T17:28:24Z)
Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z)
A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space. We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance. We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z)
XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization [98.61159823343036]
We present the Word-in-Context dataset (WiC) for assessing the ability to correctly model distinct meanings of a word. We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages. Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance.
arXiv Detail & Related papers (2020-10-13T15:32:00Z)
A Bayesian Multilingual Document Model for Zero-shot Topic Identification and Discovery [1.9215779751499527]
The model is an extension of BaySMM [Kesiraju et al 2020] to the multilingual scenario. We propagate the learned uncertainties through linear classifiers that benefit zero-shot cross-lingual topic identification. We revisit cross-lingual topic identification in zero-shot settings by taking a deeper dive into current datasets.
arXiv Detail & Related papers (2020-07-02T19:55:08Z)
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages. We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.