Semantic properties of English nominal pluralization: Insights from word
embeddings
- URL: http://arxiv.org/abs/2203.15424v1
- Date: Tue, 29 Mar 2022 10:42:47 GMT
- Title: Semantic properties of English nominal pluralization: Insights from word
embeddings
- Authors: Elnaz Shafaei-Bajestan, Masoumeh Moradipour-Tari, Peter Uhrig, R.
Harald Baayen
- Abstract summary: We show that English nominal pluralization exhibits semantic clusters.
A semantically informed method, called CosClassAvg, outperforms pluralization methods in distributional semantics.
- Score: 1.605809929862042
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic differentiation of nominal pluralization is grammaticalized in many
languages. For example, plural markers may only be relevant for human nouns.
English does not appear to make such distinctions. Using distributional
semantics, we show that English nominal pluralization exhibits semantic
clusters. For instance, pluralization of fruit words is more similar to one
another and less similar to pluralization of other semantic classes. Therefore,
reduction of the meaning shift in plural formation to the addition of an
abstract plural meaning is too simplistic. A semantically informed method,
called CosClassAvg, is introduced that outperforms pluralization methods in
distributional semantics which assume plural formation amounts to the addition
of a fixed plural vector. In comparison with our approach, a method from
compositional distributional semantics, called FRACSS, predicted plural vectors
that were more similar to the corpus-extracted plural vectors in terms of
direction but not vector length. A modeling study reveals that the observed
difference between the two predicted semantic spaces by CosClassAvg and FRACSS
carries over to how well a computational model of the listener can understand
previously unencountered plural forms. Mappings from word forms, represented
with triphone vectors, to predicted semantic vectors are more productive when
CosClassAvg-generated semantic vectors are employed as gold standard vectors
instead of FRACSS-generated vectors.
Related papers
- Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - Leveraging multilingual transfer for unsupervised semantic acoustic word
embeddings [23.822788597966646]
Acoustic word embeddings (AWEs) are fixed-dimensional vector representations of speech segments that encode phonetic content.
In this paper we explore semantic AWE modelling.
We show -- for the first time -- that AWEs can be used for downstream semantic query-by-example search.
arXiv Detail & Related papers (2023-07-05T07:46:54Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Describing Sets of Images with Textual-PCA [89.46499914148993]
We seek to semantically describe a set of images, capturing both the attributes of single images and the variations within the set.
Our procedure is analogous to Principle Component Analysis, in which the role of projection vectors is replaced with generated phrases.
arXiv Detail & Related papers (2022-10-21T17:10:49Z) - Making sense of spoken plurals [1.80476943513092]
This study focuses on the semantics of noun singulars and their plural inflectional variants in English.
One model (FRACSS) proposes that all singular-plural pairs should be taken into account when predicting plural semantics from singular semantics.
The other model (CCA) argues that conceptualization for plurality depends primarily on the semantic class of the base word.
arXiv Detail & Related papers (2022-07-05T10:44:26Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Semantic Distribution-aware Contrastive Adaptation for Semantic
Segmentation [50.621269117524925]
Domain adaptive semantic segmentation refers to making predictions on a certain target domain with only annotations of a specific source domain.
We present a semantic distribution-aware contrastive adaptation algorithm that enables pixel-wise representation alignment.
We evaluate SDCA on multiple benchmarks, achieving considerable improvements over existing algorithms.
arXiv Detail & Related papers (2021-05-11T13:21:25Z) - Unsupervised Distillation of Syntactic Information from Contextualized
Word Representations [62.230491683411536]
We tackle the task of unsupervised disentanglement between semantics and structure in neural language representations.
To this end, we automatically generate groups of sentences which are structurally similar but semantically different.
We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics.
arXiv Detail & Related papers (2020-10-11T15:13:18Z) - Integrating Categorical Semantics into Unsupervised Domain Translation [6.853826783413853]
We propose a method to learn, in an unsupervised manner, categorical semantic features that are invariant of the source and target domains.
We show that conditioning the style encoder of unsupervised domain translation methods on the learned categorical semantics leads to a translation preserving the digits on MNIST$leftrightarrow$SVHN.
arXiv Detail & Related papers (2020-10-03T02:40:46Z) - Principal Word Vectors [5.64434321651888]
We generalize principal component analysis for embedding words into a vector space.
We show that the spread and the discriminability of the principal word vectors are higher than that of other word embedding methods.
arXiv Detail & Related papers (2020-07-09T08:29:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.