The aftermath of compounds: Investigating Compounds and their Semantic Representations
- URL: http://arxiv.org/abs/2510.27477v1
- Date: Fri, 31 Oct 2025 13:58:41 GMT
- Title: The aftermath of compounds: Investigating Compounds and their Semantic Representations
- Authors: Swarang Joshi,
- Abstract summary: We compare static word vectors (GloVe) and contextualized embeddings (BERT) against human ratings of lexeme meaning dominance (LMD) and semantic transparency (ST)<n>Our results show that BERT embeddings better capture compositional semantics than GloVe, and that predictability ratings are strong predictors of semantic transparency in both human and model data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study investigates how well computational embeddings align with human semantic judgments in the processing of English compound words. We compare static word vectors (GloVe) and contextualized embeddings (BERT) against human ratings of lexeme meaning dominance (LMD) and semantic transparency (ST) drawn from a psycholinguistic dataset. Using measures of association strength (Edinburgh Associative Thesaurus), frequency (BNC), and predictability (LaDEC), we compute embedding-derived LMD and ST metrics and assess their relationships with human judgments via Spearmans correlation and regression analyses. Our results show that BERT embeddings better capture compositional semantics than GloVe, and that predictability ratings are strong predictors of semantic transparency in both human and model data. These findings advance computational psycholinguistics by clarifying the factors that drive compound word processing and offering insights into embedding-based semantic modeling.
Related papers
- SMILE: A Composite Lexical-Semantic Metric for Question-Answering Evaluation [55.26111461168754]
We introduce SMILE: Semantic Metric Integrating Lexical Exactness, a novel approach that combines sentence-level semantic understanding with keyword-level semantic understanding and easy keyword matching.<n>It is highly correlated with human judgments and computationally lightweight, bridging the gap between lexical and semantic evaluation.
arXiv Detail & Related papers (2025-11-21T17:30:18Z) - An Exploratory Analysis on the Explanatory Potential of Embedding-Based Measures of Semantic Transparency for Malay Word Recognition [0.0]
We explore embedding-based measures of semantic transparency.<n>We investigate whether these measures are significant predictors of lexical decision latencies.<n>All measures predicted decision latencies after accounting for word frequency, word length, and morphological family size.
arXiv Detail & Related papers (2025-05-09T11:57:10Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - Agentivit\`a e telicit\`a in GilBERTo: implicazioni cognitive [77.71680953280436]
The goal of this study is to investigate whether a Transformer-based neural language model infers lexical semantics.
The semantic properties considered are telicity (also combined with definiteness) and agentivity.
arXiv Detail & Related papers (2023-07-06T10:52:22Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - A Psycholinguistic Analysis of BERT's Representations of Compounds [3.034345346208211]
We build on studies that explore semantic information in Transformers at the word level and test whether BERT aligns with human semantic intuitions.
We leverage a dataset that includes human judgments on two psycholinguistic measures of compound semantic analysis.
We show that BERT-based measures moderately align with human intuitions, especially when using contextualized representations.
arXiv Detail & Related papers (2023-02-14T18:23:15Z) - Synonym Detection Using Syntactic Dependency And Neural Embeddings [3.0770051635103974]
We study the role of syntactic dependencies in deriving distributional semantics using the Vector Space Model.
We study the effectiveness of injecting human-compiled semantic knowledge into neural embeddings on computing distributional similarity.
Our results show that the syntactically conditioned contexts can interpret lexical semantics better than the unconditioned ones.
arXiv Detail & Related papers (2022-09-30T03:16:41Z) - Language Models Explain Word Reading Times Better Than Empirical
Predictability [20.38397241720963]
The traditional approach in cognitive reading research assumes that word predictability from sentence context is best captured by cloze completion probability.
Probability language models provide deeper explanations for syntactic and semantic effects than CCP.
N-gram and RNN probabilities of the present word more consistently predicted reading performance compared with topic models or CCP.
arXiv Detail & Related papers (2022-02-02T16:38:43Z) - WMDecompose: A Framework for Leveraging the Interpretable Properties of
Word Mover's Distance in Sociocultural Analysis [0.0]
One popular model that balances legibility and interpretability is Word Mover's Distance (WMD)
We introduce WMDecompose: a model and Python library that decomposes document-level distances into their constituent word-level distances, and subsequently clusters words to induce thematic elements.
arXiv Detail & Related papers (2021-10-14T13:04:38Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - A comprehensive comparative evaluation and analysis of Distributional
Semantic Models [61.41800660636555]
We perform a comprehensive evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT.
The results show that the alleged superiority of predict based models is more apparent than real, and surely not ubiquitous.
We borrow from cognitive neuroscience the methodology of Representational Similarity Analysis (RSA) to inspect the semantic spaces generated by distributional models.
arXiv Detail & Related papers (2021-05-20T15:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.