Structural invariants and semantic fingerprints in the "ego network" of
words
- URL: http://arxiv.org/abs/2203.00588v2
- Date: Mon, 3 Apr 2023 06:51:35 GMT
- Title: Structural invariants and semantic fingerprints in the "ego network" of
words
- Authors: Kilian Ollivier and Chiara Boldrini and Andrea Passarella and Marco
Conti
- Abstract summary: We postulate that similar regularities can be found in other cognitive processes, such as language production.
We use a methodology similar to the one used to uncover the well-established social cognitive constraints.
We find regularities at both the structural and semantic level.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Well-established cognitive models coming from anthropology have shown that,
due to the cognitive constraints that limit our "bandwidth" for social
interactions, humans organize their social relations according to a regular
structure. In this work, we postulate that similar regularities can be found in
other cognitive processes, such as those involving language production. In
order to investigate this claim, we analyse a dataset containing tweets of a
heterogeneous group of Twitter users (regular users and professional writers).
Leveraging a methodology similar to the one used to uncover the
well-established social cognitive constraints, we find regularities at both the
structural and semantic level. At the former, we find that a concentric layered
structure (which we call ego network of words, in analogy to the ego network of
social relationships) very well captures how individuals organise the words
they use. The size of the layers in this structure regularly grows
(approximately 2-3 times with respect to the previous one) when moving
outwards, and the two penultimate external layers consistently account for
approximately 60% and 30% of the used words, irrespective of the number of the
total number of layers of the user. For the semantic analysis, each ring of
each ego network is described by a semantic profile, which captures the topics
associated with the words in the ring. We find that ring #1 has a special role
in the model. It is semantically the most dissimilar and the most diverse among
the rings. We also show that the topics that are important in the innermost
ring also have the characteristic of being predominant in each of the other
rings, as well as in the entire ego network. In this respect, ring #1 can be
seen as the semantic fingerprint of the ego network of words.
Related papers
- QUDsim: Quantifying Discourse Similarities in LLM-Generated Text [70.22275200293964]
We introduce an abstraction based on linguistic theories in Questions Under Discussion (QUD) and question semantics to help quantify differences in discourse progression.<n>We then use this framework to build $textbfQUDsim$, a similarity metric that can detect discursive parallels between documents.<n>Using QUDsim, we find that LLMs often reuse discourse structures (more so than humans) across samples, even when content differs.
arXiv Detail & Related papers (2025-04-12T23:46:09Z) - Data-driven Coreference-based Ontology Building [48.995395445597225]
Coreference resolution is traditionally used as a component in individual document understanding.
We take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations.
We release the coreference chains resulting under a creative-commons license, along with the code.
arXiv Detail & Related papers (2024-10-22T14:30:40Z) - How Do Transformers Learn Topic Structure: Towards a Mechanistic
Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure"
We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z) - Neural networks for learning personality traits from natural language [0.0]
This thesis project is highly experimental, and the motivation behind it is to present detailed analyses on the topic.
The starting point is a dictionary of adjectives that psychological literature defines as markers of the five major personality traits, or Big Five.
We use a class of distributional algorithms invented in 2013 by Tomas Mikolov, which consists of using a convolutional neural network that learns the contexts of words in an unsupervised way.
arXiv Detail & Related papers (2023-02-23T10:33:40Z) - The Causal Structure of Semantic Ambiguities [0.0]
We identify two features: (1) joint plausibility degrees of different possible interpretations, and (2) causal structures according to which certain words play a more substantial role in the processes.
We applied this theory to a dataset of ambiguous phrases extracted from Psycholinguistics literature and their human plausibility collected by us.
arXiv Detail & Related papers (2022-06-14T12:56:34Z) - Latent Topology Induction for Understanding Contextualized
Representations [84.7918739062235]
We study the representation space of contextualized embeddings and gain insight into the hidden topology of large language models.
We show there exists a network of latent states that summarize linguistic properties of contextualized representations.
arXiv Detail & Related papers (2022-06-03T11:22:48Z) - Unsupervised Learning of Hierarchical Conversation Structure [50.29889385593043]
Goal-oriented conversations often have meaningful sub-dialogue structure, but it can be highly domain-dependent.
This work introduces an unsupervised approach to learning hierarchical conversation structure, including turn and sub-dialogue segment labels.
The decoded structure is shown to be useful in enhancing neural models of language for three conversation-level understanding tasks.
arXiv Detail & Related papers (2022-05-24T17:52:34Z) - Seeing Both the Forest and the Trees: Multi-head Attention for Joint
Classification on Different Compositional Levels [15.453888735879525]
In natural languages, words are used in association to construct sentences.
We design a deep neural network architecture that explicitly wires lower and higher linguistic components.
We show that our model, MHAL, learns to simultaneously solve them at different levels of granularity.
arXiv Detail & Related papers (2020-11-01T10:44:46Z) - Self-organizing Pattern in Multilayer Network for Words and Syllables [17.69876273827734]
We propose a new universal law that highlights the equally important role of syllables.
By plotting rank-rank frequency distribution of word and syllable for English and Chinese corpora, visible lines appear and can be fit to a master curve.
arXiv Detail & Related papers (2020-05-05T12:01:47Z) - Hierarchical Human Parsing with Typed Part-Relation Reasoning [179.64978033077222]
How to model human structures is the central theme in this task.
We seek to simultaneously exploit the representational capacity of deep graph networks and the hierarchical human structures.
arXiv Detail & Related papers (2020-03-10T16:45:41Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.