A quantitative and typological study of Early Slavic participle clauses and their competition
- URL: http://arxiv.org/abs/2405.01972v3
- Date: Wed, 8 May 2024 18:01:50 GMT
- Title: A quantitative and typological study of Early Slavic participle clauses and their competition
- Authors: Nilo Pedrazzini,
- Abstract summary: This thesis is a corpus-based, quantitative, and typological analysis of the functions of Early Slavic participle constructions and their finite competitors.
The first part leverages detailed linguistic annotation on Early Slavic corpora at the morphosyntactic, dependency, information-structural, and lexical levels.
The second part uses massively parallel data to analyze typological variation in how languages express the semantic space of English $when$.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This thesis is a corpus-based, quantitative, and typological analysis of the functions of Early Slavic participle constructions and their finite competitors ($jegda$-'when'-clauses). The first part leverages detailed linguistic annotation on Early Slavic corpora at the morphosyntactic, dependency, information-structural, and lexical levels to obtain indirect evidence for different potential functions of participle clauses and their main finite competitor and understand the roles of compositionality and default discourse reasoning as explanations for the distribution of participle constructions and $jegda$-clauses in the corpus. The second part uses massively parallel data to analyze typological variation in how languages express the semantic space of English $when$, whose scope encompasses that of Early Slavic participle constructions and $jegda$-clauses. Probabilistic semantic maps are generated and statistical methods (including Kriging, Gaussian Mixture Modelling, precision and recall analysis) are used to induce cross-linguistically salient dimensions from the parallel corpus and to study conceptual variation within the semantic space of the hypothetical concept WHEN.
Related papers
- Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference [30.31106907785379]
We show that several important aspects of the detokenization stage can be understood purely by analyzing model weights.
Our decomposition yields interpretable terms that quantify the relative contributions of position-related, token-related, and mixed effects.
arXiv Detail & Related papers (2025-01-27T03:45:29Z) - Mapping 'when'-clauses in Latin American and Caribbean languages: an experiment in subtoken-based typology [0.0]
This paper explores variation in the expression of generic temporal subordination ('when'-clauses) among the languages of Latin America and the Caribbean.
It presents probabilistic semantic maps computed on the basis of the languages of the region, thus avoiding bias towards the many world's languages that exclusively use lexified connectors.
The approach allows capturing morphological clause-linkage devices in addition to lexified connectors, paving the way for larger-scale, strategy-agnostic analyses of typological variation in temporal subordination.
arXiv Detail & Related papers (2024-04-28T17:43:24Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - A bilingual approach to specialised adjectives through word embeddings
in the karstology domain [3.92181732547846]
We present an experiment in extracting adjectives which express a specific semantic relation using word embeddings.
The results of the experiment are then thoroughly analysed and categorised into groups of adjectives exhibiting formal or semantic similarity.
arXiv Detail & Related papers (2022-03-31T08:27:15Z) - Compositional Generalization Requires Compositional Parsers [69.77216620997305]
We compare sequence-to-sequence models and models guided by compositional principles on the recent COGS corpus.
We show structural generalization is a key measure of compositional generalization and requires models that are aware of complex structure.
arXiv Detail & Related papers (2022-02-24T07:36:35Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Oracle Linguistic Graphs Complement a Pretrained Transformer Language
Model: A Cross-formalism Comparison [13.31232311913236]
We examine the extent to which, in principle, linguistic graph representations can complement and improve neural language modeling.
We find that, overall, semantic constituency structures are most useful to language modeling performance.
arXiv Detail & Related papers (2021-12-15T04:29:02Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - Decomposing lexical and compositional syntax and semantics with deep
language models [82.81964713263483]
The activations of language transformers like GPT2 have been shown to linearly map onto brain activity during speech comprehension.
Here, we propose a taxonomy to factorize the high-dimensional activations of language models into four classes: lexical, compositional, syntactic, and semantic representations.
The results highlight two findings. First, compositional representations recruit a more widespread cortical network than lexical ones, and encompass the bilateral temporal, parietal and prefrontal cortices.
arXiv Detail & Related papers (2021-03-02T10:24:05Z) - A Frobenius Algebraic Analysis for Parasitic Gaps [4.254099382808598]
We identify two types of parasitic gapping where the duplication of semantic content can be confined to the lexicon.
For parasitic gaps affecting arguments of the same predicate, the polymorphism is associated with the lexical item that introduces the primary gap.
A compositional translation relates syntactic types and derivations to the interpreting compact closed category of finite dimensional vector spaces.
arXiv Detail & Related papers (2020-05-12T09:36:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.