The Algebraic Structure of Morphosyntax
- URL: http://arxiv.org/abs/2507.00244v1
- Date: Mon, 30 Jun 2025 20:26:32 GMT
- Title: The Algebraic Structure of Morphosyntax
- Authors: Isabella Senturia, Matilde Marcolli,
- Abstract summary: We present a mathematical model of the morphology-syntax interface.<n>In this setting, morphology has compositional properties responsible for word formation, organized into a magma of morphological trees.<n>We reinterpret in this setting certain operations of Distributed Morphology as transformation that allow for flexibility in moving the boundary between syntax and morphology within the morphosyntactic objects.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Within the context of the mathematical formulation of Merge and the Strong Minimalist Thesis, we present a mathematical model of the morphology-syntax interface. In this setting, morphology has compositional properties responsible for word formation, organized into a magma of morphological trees. However, unlike syntax, we do not have movement within morphology. A coproduct decomposition exists, but it requires extending the set of morphological trees beyond those which are generated solely by the magma, to a larger set of possible morphological inputs to syntactic trees. These participate in the formation of morphosyntactic trees as an algebra over an operad, and a correspondence between algebras over an operad. The process of structure formation for morphosyntactic trees can then be described in terms of this operadic correspondence that pairs syntactic and morphological data and the morphology coproduct. We reinterpret in this setting certain operations of Distributed Morphology as transformation that allow for flexibility in moving the boundary between syntax and morphology within the morphosyntactic objects.
Related papers
- PhyloGen: Language Model-Enhanced Phylogenetic Inference via Graph Structure Generation [50.80441546742053]
Phylogenetic trees elucidate evolutionary relationships among species.<n>Traditional Markov Chain Monte Carlo methods face slow convergence and computational burdens.<n>We propose PhyloGen, a novel method leveraging a pre-trained genomic language model.
arXiv Detail & Related papers (2024-12-25T08:33:05Z) - Geometric Signatures of Compositionality Across a Language Model's Lifetime [47.25475802128033]
We study whether contemporary language models reflect intrinsic simplicity of language enabled by compositionality.<n>We find that the relationship between compositionality and geometric complexity arises due to learned linguistic features over training.<n>Our analyses reveal a striking contrast between nonlinear and linear dimensionality, showing they respectively encode semantic and superficial aspects of linguistic composition.
arXiv Detail & Related papers (2024-10-02T11:54:06Z) - Compositional Structures in Neural Embedding and Interaction Decompositions [101.40245125955306]
We describe a basic correspondence between linear algebraic structures within vector embeddings in artificial neural networks.
We introduce a characterization of compositional structures in terms of "interaction decompositions"
We establish necessary and sufficient conditions for the presence of such structures within the representations of a model.
arXiv Detail & Related papers (2024-07-12T02:39:50Z) - Unsupervised Morphological Tree Tokenizer [36.584680344291556]
We introduce morphological structure guidance to tokenization and propose a deep model to induce character-level structures of words.
Specifically, the deep model jointly encodes internal structures and representations of words with a mechanism named $textitOverriding$ to ensure the indecomposability of morphemes.
Based on the induced structures, our algorithm tokenizes words through vocabulary matching in a top-down manner.
arXiv Detail & Related papers (2024-06-21T15:35:49Z) - Labeled Morphological Segmentation with Semi-Markov Models [127.69031138022534]
We present labeled morphological segmentation, an alternative view of morphological processing that unifies several tasks.
We additionally introduce a new hierarchy of morphotactic tagsets.
We develop modelname, a discriminative morphological segmentation system that explicitly models morphotactics.
arXiv Detail & Related papers (2024-04-13T12:51:53Z) - MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation [38.87351909710185]
This paper proposes MorphGrower, which mimicks the neuron natural growth mechanism for generation.
MorphGrower generates morphologies layer by layer, with each subsequent layer conditioned on the previously generated structure.
Results on four real-world datasets demonstrate that MorphGrower outperforms MorphVAE by a notable margin.
arXiv Detail & Related papers (2024-01-17T09:03:14Z) - UniMorph 4.0: Universal Morphology [104.69846084893298]
This paper presents the expansions and improvements made on several fronts over the last couple of years.
Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages.
In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages.
arXiv Detail & Related papers (2022-05-07T09:19:02Z) - Quantifying Synthesis and Fusion and their Impact on Machine Translation [79.61874492642691]
In Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative.
In this work, we propose to reduce the rigidity of such claims, by quantifying morphological typology at the word and segment level.
For computing literature, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study.
Then, we analyse the relationship between machine translation quality and the degree of synthesis and fusion at word (nouns and verbs for English-Turkish,
arXiv Detail & Related papers (2022-05-06T17:04:58Z) - A Formal Description of Sorani Kurdish Morphology [0.0]
Sorani Kurdish, also known as Central Kurdish, has a complex morphology.
We provide a detailed description of Sorani Kurdish morphological and morphophonological constructions in a formal way.
arXiv Detail & Related papers (2021-09-08T21:34:26Z) - Decomposing lexical and compositional syntax and semantics with deep
language models [82.81964713263483]
The activations of language transformers like GPT2 have been shown to linearly map onto brain activity during speech comprehension.
Here, we propose a taxonomy to factorize the high-dimensional activations of language models into four classes: lexical, compositional, syntactic, and semantic representations.
The results highlight two findings. First, compositional representations recruit a more widespread cortical network than lexical ones, and encompass the bilateral temporal, parietal and prefrontal cortices.
arXiv Detail & Related papers (2021-03-02T10:24:05Z) - Modelling Verbal Morphology in Nen [4.6877729174041605]
We use state-of-the-art machine learning models for morphological reinflection to model Nen verbal morphology.
Our results show sensitivity to training data composition; different distributions of verb type yield different accuracies.
We also demonstrate the types of patterns that can be inferred from the training data through the case study of syncretism.
arXiv Detail & Related papers (2020-11-30T01:22:05Z) - Neural Modeling for Named Entities and Morphology (NEMO^2) [9.092452284460283]
We develop a novel NER benchmark for Modern Hebrew, a morphologically rich-and-ambiguous language.
We show that explicitly modeling morphological boundaries leads to improved NER performance.
A novel hybrid architecture, in which NER precedes and prunes morphological decomposition, greatly outperforms the standard pipeline.
arXiv Detail & Related papers (2020-07-30T17:43:14Z) - A Frobenius Algebraic Analysis for Parasitic Gaps [4.254099382808598]
We identify two types of parasitic gapping where the duplication of semantic content can be confined to the lexicon.
For parasitic gaps affecting arguments of the same predicate, the polymorphism is associated with the lexical item that introduces the primary gap.
A compositional translation relates syntactic types and derivations to the interpreting compact closed category of finite dimensional vector spaces.
arXiv Detail & Related papers (2020-05-12T09:36:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.