Topology of Syntax Networks across Languages
- URL: http://arxiv.org/abs/2503.06724v1
- Date: Sun, 09 Mar 2025 18:47:17 GMT
- Title: Topology of Syntax Networks across Languages
- Authors: Juan Soria-Postigo, Luis F Seoane,
- Abstract summary: This thesis investigates the structure and properties of syntax networks.<n>It will try to find clusters/phylogenies of languages that share similar network features.<n>Results across different languages will also be compared in an attempt to discover universally preserved structural patterns.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Syntax connects words to each other in very specific ways. Two words are syntactically connected if they depend directly on each other. Syntactic connections usually happen within a sentence. Gathering all those connection across several sentences gives birth to syntax networks. Earlier studies in the field have analysed the structure and properties of syntax networks trying to find clusters/phylogenies of languages that share similar network features. The results obtained in those studies will be put to test in this thesis by increasing both the number of languages and the number of properties considered in the analysis. Besides that, language networks of particular languages will be inspected in depth by means of a novel network analysis [25]. Words (nodes of the network) will be clustered into topological communities whose members share similar features. The properties of each of these communities will be thoroughly studied along with the Part of Speech (grammatical class) of each word. Results across different languages will also be compared in an attempt to discover universally preserved structural patterns across syntax networks.
Related papers
- Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - Complex systems approach to natural language [0.0]
Review summarizes the main methodological concepts used in studying natural language from the perspective of complexity science.
Three main complexity-related research trends in quantitative linguistics are covered.
arXiv Detail & Related papers (2024-01-05T12:01:26Z) - Learning Multiplex Representations on Text-Attributed Graphs with One Language Model Encoder [55.24276913049635]
We propose METAG, a new framework for learning Multiplex rEpresentations on Text-Attributed Graphs.
In contrast to existing methods, METAG uses one text encoder to model the shared knowledge across relations.
We conduct experiments on nine downstream tasks in five graphs from both academic and e-commerce domains.
arXiv Detail & Related papers (2023-10-10T14:59:22Z) - Media of Langue: The Interface for Exploring Word Translation Network/Space [0.0]
We discover the huge network formed by the chain of these mutual translations as Word Translation Network.
We propose Media of Langue, a novel interface for exploring this network.
arXiv Detail & Related papers (2023-08-25T03:54:20Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Topological properties and organizing principles of semantic networks [3.8462776107938317]
We study the properties of semantic networks from ConceptNet, defined by 7 semantic relations from 11 different languages.
We find that semantic networks have universal basic properties: they are sparse, highly clustered, and many exhibit power-law degree distributions.
In some networks the connections are similarity-based, while in others the connections are more complementarity-based.
arXiv Detail & Related papers (2023-04-24T11:12:21Z) - TeKo: Text-Rich Graph Neural Networks with External Knowledge [75.91477450060808]
We propose a novel text-rich graph neural network with external knowledge (TeKo)
We first present a flexible heterogeneous semantic network that incorporates high-quality entities.
We then introduce two types of external knowledge, that is, structured triplets and unstructured entity description.
arXiv Detail & Related papers (2022-06-15T02:33:10Z) - Feature-rich multiplex lexical networks reveal mental strategies of
early language learning [0.7111443975103329]
We introduce FEature-Rich MUltiplex LEXical (FERMULEX) networks.
Similarities model heterogenous word associations across semantic/syntactic/phonological aspects of knowledge.
Words are enriched with multi-dimensional feature embeddings including frequency, age of acquisition, length and polysemy.
arXiv Detail & Related papers (2022-01-13T16:44:51Z) - Multilingual Irony Detection with Dependency Syntax and Neural Models [61.32653485523036]
It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme.
The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.
arXiv Detail & Related papers (2020-11-11T11:22:05Z) - Self-organizing Pattern in Multilayer Network for Words and Syllables [17.69876273827734]
We propose a new universal law that highlights the equally important role of syllables.
By plotting rank-rank frequency distribution of word and syllable for English and Chinese corpora, visible lines appear and can be fit to a master curve.
arXiv Detail & Related papers (2020-05-05T12:01:47Z) - Linguistic Typology Features from Text: Inferring the Sparse Features of
World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers.
We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z) - Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual
Lexical Semantic Similarity [67.36239720463657]
Multi-SimLex is a large-scale lexical resource and evaluation benchmark covering datasets for 12 diverse languages.
Each language dataset is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs.
Owing to the alignment of concepts across languages, we provide a suite of 66 cross-lingual semantic similarity datasets.
arXiv Detail & Related papers (2020-03-10T17:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.