Related papers: From Linear Input to Hierarchical Structure: Function Words as Statistical Cues for Language Learning

From Linear Input to Hierarchical Structure: Function Words as Statistical Cues for Language Learning

URL: http://arxiv.org/abs/2601.21191v1
Date: Thu, 29 Jan 2026 02:42:12 GMT
Title: From Linear Input to Hierarchical Structure: Function Words as Statistical Cues for Language Learning
Authors: Xiulin Yang, Heidi Getz, Ethan Gotlieb Wilcox,
Abstract summary: We argue that function words play a crucial role in language acquisition due to their distinctive distributional properties.<n>We show that language variants preserving all three properties are more easily acquired by neural learners.
Score: 2.893006778402251
License: http://creativecommons.org/licenses/by/4.0/
Abstract: What statistical conditions support learning hierarchical structure from linear input? In this paper, we address this question by focusing on the statistical distribution of function words. Function words have long been argued to play a crucial role in language acquisition due to their distinctive distributional properties, including high frequency, reliable association with syntactic structure, and alignment with phrase boundaries. We use cross-linguistic corpus analysis to first establish that all three properties are present across 186 studied languages. Next, we use a combination of counterfactual language modeling and ablation experiments to show that language variants preserving all three properties are more easily acquired by neural learners, with frequency and structural association contributing more strongly than boundary alignment. Follow-up probing and ablation analyses further reveal that different learning conditions lead to systematically different reliance on function words, indicating that similar performance can arise from distinct internal mechanisms.

Related papers

Optimal Turkish Subword Strategies at Scale: Systematic Evaluation of Data, Vocabulary, Morphology Interplay [4.061135251278187]
Tokenization is a pivotal design choice for neural language modeling in morphologically rich languages.<n>We present the first comprehensive, principled study of Turkish subword tokenization.
arXiv Detail & Related papers (2026-02-06T18:41:14Z)
Deep networks learn to parse uniform-depth context-free languages from local statistics [12.183764229746926]
Understanding how the structure of language can be learned from sentences alone is a central question in both cognitive science and machine learning.<n>We introduce a class of context-free grammars (PCFGs) in which both the degree of ambiguity and the correlation structure across scales can be controlled.<n>We propose a unifying framework where correlations at different scales lift local ambiguities, enabling the emergence of hierarchical representations of the data.
arXiv Detail & Related papers (2026-01-31T17:35:06Z)
Vocabulary embeddings organize linguistic structure early in language model training [3.2661767443292646]
Large language models (LLMs) work by manipulating the geometry of input embedding vectors over multiple layers.<n>Here, we ask: how are the input vocabulary representations of language models structured, and how does this structure evolve over training?<n>We run a suite of experiments that correlate the geometric structure of the input embeddings and output embeddings of two open-source models with semantic, syntactic, and frequency-based metrics over the course of training.
arXiv Detail & Related papers (2025-10-08T23:26:22Z)
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures [49.19753720526998]
We derive theoretical scaling laws for neural network performance on synthetic datasets.<n>We validate that convolutional networks, whose structure aligns with that of the generative process through locality and weight sharing, enjoy a faster scaling of performance.<n>This finding clarifies the architectural biases underlying neural scaling laws and highlights how representation learning is shaped by the interaction between model architecture and the statistical properties of data.
arXiv Detail & Related papers (2025-05-11T17:44:14Z)
Exploring Intra and Inter-language Consistency in Embeddings with ICA [17.87419386215488]
Independent Component Analysis (ICA) creates clearer semantic axes by identifying independent key features. Previous research has shown ICA's potential to reveal universal semantic axes across languages. We investigated consistency of semantic axes in two ways: both within a single language and across multiple languages.
arXiv Detail & Related papers (2024-06-18T10:24:50Z)
Linguistic Structure from a Bottleneck on Sequential Information Processing [5.850665541267672]
We show that natural-language-like systematicity arises in codes that are constrained by predictive information. We show that human languages are structured to have low predictive information at the levels of phonology, morphology, syntax, and semantics.
arXiv Detail & Related papers (2024-05-20T15:25:18Z)
A Joint Matrix Factorization Analysis of Multilingual Representations [28.751144371901958]
We present an analysis tool based on joint matrix factorization for comparing latent representations of multilingual and monolingual models. We study to what extent and how morphosyntactic features are reflected in the representations learned by multilingual pre-trained models.
arXiv Detail & Related papers (2023-10-24T04:43:45Z)
Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions. This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z)
A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space. We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance. We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z)
Discrete representations in neural models of spoken language [56.29049879393466]
We compare the merits of four commonly used metrics in the context of weakly supervised models of spoken language. We find that the different evaluation metrics can give inconsistent results.
arXiv Detail & Related papers (2021-05-12T11:02:02Z)
Multilingual Irony Detection with Dependency Syntax and Neural Models [61.32653485523036]
It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme. The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.
arXiv Detail & Related papers (2020-11-11T11:22:05Z)
Linguistic Typology Features from Text: Inferring the Sparse Features of World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers. We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z)
Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models [27.91397366776451]
Training LSTMs on latent structure (MIDI music or Java code) improves test performance on natural language. Experiments on transfer between natural languages controlling for vocabulary overlap show that zero-shot performance on a test language is highly correlated with typological similarity to the training language.
arXiv Detail & Related papers (2020-04-30T06:24:03Z)
Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures. We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.