Related papers: Type and Complexity Signals in Multilingual Question Representations

Type and Complexity Signals in Multilingual Question Representations

URL: http://arxiv.org/abs/2510.06304v1
Date: Tue, 07 Oct 2025 17:50:05 GMT
Title: Type and Complexity Signals in Multilingual Question Representations
Authors: Robin Kokot, Wessel Poelman,
Abstract summary: We introduce the Question Type and Complexity dataset with sentences across seven languages.<n>We compare layer-wise probes on frozen Glot500-m representations against subword TF-IDF baselines, and a fine-tuned model.<n>Results show that statistical features classify questions effectively in languages with explicit marking, while neural probes capture fine-grained structural complexity patterns better.
Score: 0.27930955543692815
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work investigates how a multilingual transformer model represents morphosyntactic properties of questions. We introduce the Question Type and Complexity (QTC) dataset with sentences across seven languages, annotated with type information and complexity metrics including dependency length, tree depth, and lexical density. Our evaluation extends probing methods to regression labels with selectivity controls to quantify gains in generalizability. We compare layer-wise probes on frozen Glot500-m (Imani et al., 2023) representations against subword TF-IDF baselines, and a fine-tuned model. Results show that statistical features classify questions effectively in languages with explicit marking, while neural probes capture fine-grained structural complexity patterns better. We use these results to evaluate when contextual representations outperform statistical baselines and whether parameter updates reduce the availability of pre-trained linguistic information.

Related papers

Tokenization and Representation Biases in Multilingual Models on Dialectal NLP Tasks [7.216732751280017]
We correlate Tokenization Parity (TP) and Information Parity (IP) as measures of representational biases in pre-trained multilingual models.<n>We compare state-of-the-art decoder-only LLMs with encoder-based models across three tasks: dialect classification, topic classification, and extractive question answering.<n>Our analysis reveals that TP is a better predictor of the performance on tasks reliant on syntactic and morphological cues, while IP better predicts performance in semantic tasks.
arXiv Detail & Related papers (2025-09-24T12:13:53Z)
How Compositional Generalization and Creativity Improve as Diffusion Models are Trained [82.08869888944324]
How many samples do generative models need in order to learn composition rules?<n>What signal in the data is exploited to learn those rules?<n>We discuss connections between the hierarchical clustering mechanism we introduce here and the renormalization group in physics.
arXiv Detail & Related papers (2025-02-17T18:06:33Z)
Explaining Datasets in Words: Statistical Models with Natural Language Parameters [66.69456696878842]
We introduce a family of statistical models -- including clustering, time series, and classification models -- parameterized by natural language predicates.<n>We apply our framework to a wide range of problems: taxonomizing user chat dialogues, characterizing how they evolve across time, finding categories where one language model is better than the other.
arXiv Detail & Related papers (2024-09-13T01:40:20Z)
Morphosyntactic probing of multilingual BERT models [41.83131308999425]
We introduce an extensive dataset for multilingual probing of morphological information in language models. We find that pre-trained Transformer models (mBERT and XLM-RoBERTa) learn features that attain strong performance across these tasks.
arXiv Detail & Related papers (2023-06-09T19:15:20Z)
Using Natural Language Explanations to Rescale Human Judgments [81.66697572357477]
We propose a method to rescale ordinal annotations and explanations using large language models (LLMs)<n>We feed annotators' Likert ratings and corresponding explanations into an LLM and prompt it to produce a numeric score anchored in a scoring rubric.<n>Our method rescales the raw judgments without impacting agreement and brings the scores closer to human judgments grounded in the same scoring rubric.
arXiv Detail & Related papers (2023-05-24T06:19:14Z)
Cross-Lingual Transfer of Cognitive Processing Complexity [11.939409227407769]
We use sentence-level eye-tracking patterns as a cognitive indicator for structural complexity. We show that the multilingual model XLM-RoBERTa can successfully predict varied patterns for 13 typologically diverse languages.
arXiv Detail & Related papers (2023-02-24T15:48:23Z)
A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification. The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample. A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z)
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models. We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
Syntax-informed Question Answering with Heterogeneous Graph Transformer [2.139714421848487]
We present a linguistics-informed question answering approach that extends and fine-tunes a pre-trained neural language model. We illustrate the approach by the addition of syntactic information in the form of dependency and constituency graphic structures connecting tokens and virtual tokens.
arXiv Detail & Related papers (2022-04-01T07:48:03Z)
Investigating representations of verb bias in neural language models [7.455546102930909]
We introduce DAIS, a benchmark dataset containing 50K human judgments for 5K distinct sentence pairs in the English dative alternation. This dataset includes 200 unique verbs and systematically varies the definiteness and length of arguments. We use this dataset, as well as an existing corpus of naturally occurring data, to evaluate how well recent neural language models capture human preferences.
arXiv Detail & Related papers (2020-10-05T22:39:08Z)
Linguistic Typology Features from Text: Inferring the Sparse Features of World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers. We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z)
Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information. We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.