Related papers: Multilingual Nonce Dependency Treebanks: Understanding how Language Models represent and process syntactic structure

Multilingual Nonce Dependency Treebanks: Understanding how Language Models represent and process syntactic structure

URL: http://arxiv.org/abs/2311.07497v2
Date: Wed, 12 Jun 2024 13:41:16 GMT
Title: Multilingual Nonce Dependency Treebanks: Understanding how Language Models represent and process syntactic structure
Authors: David Arps, Laura Kallmeyer, Younes Samih, Hassan Sajjad,
Abstract summary: SPUD (Semantically Perturbed Universal Dependencies) is a framework for creating nonce treebanks for the Universal Dependencies (UD) corpora. We create nonce data in Arabic, English, French, German, and Russian, and demonstrate two use cases of SPUD treebanks.
Score: 15.564927804136852
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce SPUD (Semantically Perturbed Universal Dependencies), a framework for creating nonce treebanks for the multilingual Universal Dependencies (UD) corpora. SPUD data satisfies syntactic argument structure, provides syntactic annotations, and ensures grammaticality via language-specific rules. We create nonce data in Arabic, English, French, German, and Russian, and demonstrate two use cases of SPUD treebanks. First, we investigate the effect of nonce data on word co-occurrence statistics, as measured by perplexity scores of autoregressive (ALM) and masked language models (MLM). We find that ALM scores are significantly more affected by nonce data than MLM scores. Second, we show how nonce data affects the performance of syntactic dependency probes. We replicate the findings of M\"uller-Eberstein et al. (2022) on nonce test data and show that the performance declines on both MLMs and ALMs wrt. original test data. However, a majority of the performance is kept, suggesting that the probe indeed learns syntax independently from semantics.

Related papers

Measuring the Effect of Disfluency in Multilingual Knowledge Probing Benchmarks [27.561894897347376]
We compare knowledge retrieval scores between the initial (templated) MLAMA dataset and its sentence-level translations made by Google Translate and ChatGPT.<n>We observe a significant increase in knowledge retrieval scores, and provide a qualitative analysis for possible reasons behind it.<n>We also make an additional analysis of 5 more languages from different families and see similar patterns.
arXiv Detail & Related papers (2025-10-16T20:16:56Z)
Parsing the Switch: LLM-Based UD Annotation for Complex Code-Switched and Low-Resource Languages [11.627508350795118]
BiLingua is a pipeline for Universal Dependencies (UD) annotations for code-switched text.<n>First, we develop a prompt-based framework for Spanish-English and Spanish-Guaran'i data.<n>Second, we release two datasets, including the first Spanish-Guaran'i-parsed corpus.<n>Third, we conduct a detailed syntactic analysis of switch points across language pairs and communicative contexts.
arXiv Detail & Related papers (2025-06-08T20:23:57Z)
Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate [36.641755706551336]
Large language models (LLMs) provide detailed and impressive responses to queries in English.<n>But are they really consistent at responding to the same query in other languages?<n>We propose a framework to evaluate LLM's cross-lingual consistency based on a simple Translate then Evaluate strategy.
arXiv Detail & Related papers (2025-05-28T06:00:21Z)
ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models [75.05436691700572]
We introduce ExpliCa, a new dataset for evaluating Large Language Models (LLMs) in explicit causal reasoning. We tested seven commercial and open-source LLMs on ExpliCa through prompting and perplexity-based metrics. Surprisingly, models tend to confound temporal relations with causal ones, and their performance is also strongly influenced by the linguistic order of the events.
arXiv Detail & Related papers (2025-02-21T14:23:14Z)
Analysis of LLM as a grammatical feature tagger for African American English [0.6927055673104935]
African American English (AAE) presents unique challenges in natural language processing (NLP) This research systematically compares the performance of available NLP models. This study highlights the necessity for improved model training and architectural adjustments to better accommodate AAE's unique linguistic characteristics.
arXiv Detail & Related papers (2025-02-09T19:46:33Z)
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback [50.84142264245052]
This work introduces the Align-SLM framework to enhance the semantic understanding of textless Spoken Language Models (SLMs) Our approach generates multiple speech continuations from a given prompt and uses semantic metrics to create preference data for Direct Preference Optimization (DPO) We evaluate the framework using ZeroSpeech 2021 benchmarks for lexical and syntactic modeling, the spoken version of the StoryCloze dataset for semantic coherence, and other speech generation metrics, including the GPT4-o score and human evaluation.
arXiv Detail & Related papers (2024-11-04T06:07:53Z)
Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language [0.0]
This study introduces a prescriptive annotation benchmark grounded in humanities research to ensure consistent, unbiased labeling of offensive language. We contribute two newly annotated datasets that achieve higher inter-annotator agreement between human and language model (LLM) annotations.
arXiv Detail & Related papers (2024-10-17T08:10:24Z)
Which Syntactic Capabilities Are Statistically Learned by Masked Language Models for Code? [51.29970742152668]
We highlight relying on accuracy-based measurements may lead to an overestimation of models' capabilities. To address these issues, we introduce a technique called SyntaxEval in Syntactic Capabilities.
arXiv Detail & Related papers (2024-01-03T02:44:02Z)
Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions. This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z)
Measuring Reliability of Large Language Models through Semantic Consistency [3.4990427823966828]
We develop a measure of semantic consistency that allows the comparison of open-ended text outputs. We implement several versions of this consistency metric to evaluate the performance of a number of PLMs on paraphrased versions of questions.
arXiv Detail & Related papers (2022-11-10T20:21:07Z)
Multilingual Syntax-aware Language Modeling through Dependency Tree Conversion [12.758523394180695]
We study the effect on neural language models (LMs) performance across nine conversion methods and five languages. On average, the performance of our best model represents a 19 % increase in accuracy over the worst choice across all languages. Our experiments highlight the importance of choosing the right tree formalism, and provide insights into making an informed decision.
arXiv Detail & Related papers (2022-04-19T03:56:28Z)
On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks. We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments. We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z)
Multilingual Irony Detection with Dependency Syntax and Neural Models [61.32653485523036]
It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme. The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.
arXiv Detail & Related papers (2020-11-11T11:22:05Z)
GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction [107.8262586956778]
We introduce graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations. GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree. We propose to utilize the self-attention mechanism to learn the dependencies between words with different syntactic distances.
arXiv Detail & Related papers (2020-10-06T20:30:35Z)
Cross-Lingual Adaptation Using Universal Dependencies [1.027974860479791]
We show that models trained using UD parse trees for complex NLP tasks can characterize very different languages. Based on UD parse trees, we develop several models using tree kernels and show that these models trained on the English dataset can correctly classify data of other languages.
arXiv Detail & Related papers (2020-03-24T13:04:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.