Rethinking Crowd Sourcing for Semantic Similarity
- URL: http://arxiv.org/abs/2109.11969v1
- Date: Fri, 24 Sep 2021 13:57:30 GMT
- Title: Rethinking Crowd Sourcing for Semantic Similarity
- Authors: Shaul Solomon and Adam Cohn and Hernan Rosenblum and Chezi Hershkovitz
and Ivan P. Yamshchikov
- Abstract summary: This paper investigates the ambiguities inherent in crowd-sourced semantic labeling.
It shows that annotators that treat semantic similarity as a binary category play the most important role in the labeling.
- Score: 0.13999481573773073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimation of semantic similarity is crucial for a variety of natural
language processing (NLP) tasks. In the absence of a general theory of semantic
information, many papers rely on human annotators as the source of ground truth
for semantic similarity estimation. This paper investigates the ambiguities
inherent in crowd-sourced semantic labeling. It shows that annotators that
treat semantic similarity as a binary category (two sentences are either
similar or not similar and there is no middle ground) play the most important
role in the labeling. The paper offers heuristics to filter out unreliable
annotators and stimulates further discussions on human perception of semantic
similarity.
Related papers
- Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - Domain Embeddings for Generating Complex Descriptions of Concepts in
Italian Language [65.268245109828]
We propose a Distributional Semantic resource enriched with linguistic and lexical information extracted from electronic dictionaries.
The resource comprises 21 domain-specific matrices, one comprehensive matrix, and a Graphical User Interface.
Our model facilitates the generation of reasoned semantic descriptions of concepts by selecting matrices directly associated with concrete conceptual knowledge.
arXiv Detail & Related papers (2024-02-26T15:04:35Z) - Interpretable Measures of Conceptual Similarity by
Complexity-Constrained Descriptive Auto-Encoding [112.0878081944858]
Quantifying the degree of similarity between images is a key copyright issue for image-based machine learning.
We seek to define and compute a notion of "conceptual similarity" among images that captures high-level relations.
Two highly dissimilar images can be discriminated early in their description, whereas conceptually dissimilar ones will need more detail to be distinguished.
arXiv Detail & Related papers (2024-02-14T03:31:17Z) - Agentivit\`a e telicit\`a in GilBERTo: implicazioni cognitive [77.71680953280436]
The goal of this study is to investigate whether a Transformer-based neural language model infers lexical semantics.
The semantic properties considered are telicity (also combined with definiteness) and agentivity.
arXiv Detail & Related papers (2023-07-06T10:52:22Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - Measuring Fine-Grained Semantic Equivalence with Abstract Meaning
Representation [9.666975331506812]
Identifying semantically equivalent sentences is important for many NLP tasks.
Current approaches to semantic equivalence take a loose, sentence-level approach to "equivalence"
We introduce a novel, more sensitive method of characterizing semantic equivalence that leverages Abstract Meaning Representation graph structures.
arXiv Detail & Related papers (2022-10-06T16:08:27Z) - Evaluation of taxonomic and neural embedding methods for calculating
semantic similarity [0.0]
We study the mechanisms between taxonomic and distributional similarity measures.
We find that taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity.
The synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning.
arXiv Detail & Related papers (2022-09-30T02:54:21Z) - Patterns of Lexical Ambiguity in Contextualised Language Models [9.747449805791092]
We introduce an extended, human-annotated dataset of graded word sense similarity and co-predication.
Both types of human judgements indicate that the similarity of polysemic interpretations falls in a continuum between identity of meaning and homonymy.
Our dataset appears to capture a substantial part of the complexity of lexical ambiguity, and can provide a realistic test bed for contextualised embeddings.
arXiv Detail & Related papers (2021-09-27T13:11:44Z) - Comparative Probing of Lexical Semantics Theories for Cognitive
Plausibility and Technological Usefulness [1.028961895672321]
Lexical semantics theories differ in advocating that the meaning of words is represented as an inference graph, a feature mapping or a vector space.
We systematically probe different lexical semantics theories for their levels of cognitive plausibility and of technological usefulness.
arXiv Detail & Related papers (2020-11-16T14:46:08Z) - Synonymy = Translational Equivalence [6.198307677263333]
Synonymy and translational equivalence are the relations of sameness of meaning within and across languages.
This paper proposes a unifying treatment of these two relations, which is validated by experiments on existing resources.
arXiv Detail & Related papers (2020-04-28T23:15:02Z) - Human Correspondence Consensus for 3D Object Semantic Understanding [56.34297279246823]
In this paper, we introduce a new dataset named CorresPondenceNet.
Based on this dataset, we are able to learn dense semantic embeddings with a novel geodesic consistency loss.
We show that CorresPondenceNet could not only boost fine-grained understanding of heterogeneous objects but also cross-object registration and partial object matching.
arXiv Detail & Related papers (2019-12-29T04:24:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.