Feature-rich multiplex lexical networks reveal mental strategies of
early language learning
- URL: http://arxiv.org/abs/2201.05061v1
- Date: Thu, 13 Jan 2022 16:44:51 GMT
- Title: Feature-rich multiplex lexical networks reveal mental strategies of
early language learning
- Authors: Salvatore Citraro and Michael S. Vitevitch and Massimo Stella and
Giulio Rossetti
- Abstract summary: We introduce FEature-Rich MUltiplex LEXical (FERMULEX) networks.
Similarities model heterogenous word associations across semantic/syntactic/phonological aspects of knowledge.
Words are enriched with multi-dimensional feature embeddings including frequency, age of acquisition, length and polysemy.
- Score: 0.7111443975103329
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge in the human mind exhibits a dualistic vector/network nature.
Modelling words as vectors is key to natural language processing, whereas
networks of word associations can map the nature of semantic memory. We
reconcile these paradigms - fragmented across linguistics, psychology and
computer science - by introducing FEature-Rich MUltiplex LEXical (FERMULEX)
networks. This novel framework merges structural similarities in networks and
vector features of words, which can be combined or explored independently.
Similarities model heterogenous word associations across
semantic/syntactic/phonological aspects of knowledge. Words are enriched with
multi-dimensional feature embeddings including frequency, age of acquisition,
length and polysemy. These aspects enable unprecedented explorations of
cognitive knowledge. Through CHILDES data, we use FERMULEX networks to model
normative language acquisition by 1000 toddlers between 18 and 30 months.
Similarities and embeddings capture word homophily via conformity, which
measures assortative mixing via distance and features. Conformity unearths a
language kernel of frequent/polysemous/short nouns and verbs key for basic
sentence production, supporting recent evidence of children's syntactic
constructs emerging at 30 months. This kernel is invisible to network
core-detection and feature-only clustering: It emerges from the dual
vector/network nature of words. Our quantitative analysis reveals two key
strategies in early word learning. Modelling word acquisition as random walks
on FERMULEX topology, we highlight non-uniform filling of communicative
developmental inventories (CDIs). Conformity-based walkers lead to accurate
(75%), precise (55%) and partially well-recalled (34%) predictions of early
word learning in CDIs, providing quantitative support to previous empirical
findings and developmental theories.
Related papers
- Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction models [31.006803764376475]
Semantic and syntactic bootstrapping posit that children use their prior knowledge of one linguistic domain, say syntactic relations, to help later acquire another, such as the meanings of new words.
Here, we argue that they are instead both contingent on a more general learning strategy for language acquisition: joint learning.
Using a series of neural visually-grounded grammar induction models, we demonstrate that both syntactic and semantic bootstrapping effects are strongest when syntax and semantics are learnt simultaneously.
arXiv Detail & Related papers (2024-06-17T18:01:06Z) - Acoustic characterization of speech rhythm: going beyond metrics with
recurrent neural networks [0.0]
We train a recurrent neural network on a language identification task over a large database of speech recordings in 21 languages.
The network was able to identify the language of 10-second recordings in 40% of the cases, and the language was in the top-3 guesses in two-thirds of the cases.
arXiv Detail & Related papers (2024-01-22T09:49:44Z) - Towards Open Vocabulary Learning: A Survey [146.90188069113213]
Deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection.
Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training.
This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field.
arXiv Detail & Related papers (2023-06-28T02:33:06Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Disentangling Learnable and Memorizable Data via Contrastive Learning
for Semantic Communications [81.10703519117465]
A novel machine reasoning framework is proposed to disentangle source data so as to make it semantic-ready.
In particular, a novel contrastive learning framework is proposed, whereby instance and cluster discrimination are performed on the data.
Deep semantic clusters of highest confidence are considered learnable, semantic-rich data.
Our simulation results showcase the superiority of our contrastive learning approach in terms of semantic impact and minimalism.
arXiv Detail & Related papers (2022-12-18T12:00:12Z) - Unsupervised Multimodal Word Discovery based on Double Articulation
Analysis with Co-occurrence cues [7.332652485849632]
Human infants acquire their verbal lexicon with minimal prior knowledge of language.
This study proposes a novel fully unsupervised learning method for discovering speech units.
The proposed method can acquire words and phonemes from speech signals using unsupervised learning.
arXiv Detail & Related papers (2022-01-18T07:31:59Z) - Word Acquisition in Neural Language Models [0.38073142980733]
We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words.
We find that the effects of concreteness, word length, and lexical class are pointedly different in children and language models.
arXiv Detail & Related papers (2021-10-05T23:26:16Z) - Decomposing lexical and compositional syntax and semantics with deep
language models [82.81964713263483]
The activations of language transformers like GPT2 have been shown to linearly map onto brain activity during speech comprehension.
Here, we propose a taxonomy to factorize the high-dimensional activations of language models into four classes: lexical, compositional, syntactic, and semantic representations.
The results highlight two findings. First, compositional representations recruit a more widespread cortical network than lexical ones, and encompass the bilateral temporal, parietal and prefrontal cortices.
arXiv Detail & Related papers (2021-03-02T10:24:05Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - On Vocabulary Reliance in Scene Text Recognition [79.21737876442253]
Methods perform well on images with words within vocabulary but generalize poorly to images with words outside vocabulary.
We call this phenomenon "vocabulary reliance"
We propose a simple yet effective mutual learning strategy to allow models of two families to learn collaboratively.
arXiv Detail & Related papers (2020-05-08T11:16:58Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.