Related papers: Rethinking Crowd Sourcing for Semantic Similarity

Rethinking Crowd Sourcing for Semantic Similarity

URL: http://arxiv.org/abs/2109.11969v1
Date: Fri, 24 Sep 2021 13:57:30 GMT
Title: Rethinking Crowd Sourcing for Semantic Similarity
Authors: Shaul Solomon and Adam Cohn and Hernan Rosenblum and Chezi Hershkovitz and Ivan P. Yamshchikov
Abstract summary: This paper investigates the ambiguities inherent in crowd-sourced semantic labeling. It shows that annotators that treat semantic similarity as a binary category play the most important role in the labeling.
Score: 0.13999481573773073
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Estimation of semantic similarity is crucial for a variety of natural language processing (NLP) tasks. In the absence of a general theory of semantic information, many papers rely on human annotators as the source of ground truth for semantic similarity estimation. This paper investigates the ambiguities inherent in crowd-sourced semantic labeling. It shows that annotators that treat semantic similarity as a binary category (two sentences are either similar or not similar and there is no middle ground) play the most important role in the labeling. The paper offers heuristics to filter out unreliable annotators and stimulates further discussions on human perception of semantic similarity.

Related papers

Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs) We form "semantic tokens" by merging the semantically similar subwords and their embeddings. inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z)
Domain Embeddings for Generating Complex Descriptions of Concepts in Italian Language [65.268245109828]
We propose a Distributional Semantic resource enriched with linguistic and lexical information extracted from electronic dictionaries. The resource comprises 21 domain-specific matrices, one comprehensive matrix, and a Graphical User Interface. Our model facilitates the generation of reasoned semantic descriptions of concepts by selecting matrices directly associated with concrete conceptual knowledge.
arXiv Detail & Related papers (2024-02-26T15:04:35Z)
Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding [112.0878081944858]
Quantifying the degree of similarity between images is a key copyright issue for image-based machine learning. We seek to define and compute a notion of "conceptual similarity" among images that captures high-level relations. Two highly dissimilar images can be discriminated early in their description, whereas conceptually dissimilar ones will need more detail to be distinguished.
arXiv Detail & Related papers (2024-02-14T03:31:17Z)
Agentivit\`a e telicit\`a in GilBERTo: implicazioni cognitive [77.71680953280436]
The goal of this study is to investigate whether a Transformer-based neural language model infers lexical semantics. The semantic properties considered are telicity (also combined with definiteness) and agentivity.
arXiv Detail & Related papers (2023-07-06T10:52:22Z)
Textual Entailment Recognition with Semantic Features from Empirical Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text. In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis. We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z)
Measuring Fine-Grained Semantic Equivalence with Abstract Meaning Representation [9.666975331506812]
Identifying semantically equivalent sentences is important for many NLP tasks. Current approaches to semantic equivalence take a loose, sentence-level approach to "equivalence" We introduce a novel, more sensitive method of characterizing semantic equivalence that leverages Abstract Meaning Representation graph structures.
arXiv Detail & Related papers (2022-10-06T16:08:27Z)
Evaluation of taxonomic and neural embedding methods for calculating semantic similarity [0.0]
We study the mechanisms between taxonomic and distributional similarity measures. We find that taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity. The synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning.
arXiv Detail & Related papers (2022-09-30T02:54:21Z)
Patterns of Lexical Ambiguity in Contextualised Language Models [9.747449805791092]
We introduce an extended, human-annotated dataset of graded word sense similarity and co-predication. Both types of human judgements indicate that the similarity of polysemic interpretations falls in a continuum between identity of meaning and homonymy. Our dataset appears to capture a substantial part of the complexity of lexical ambiguity, and can provide a realistic test bed for contextualised embeddings.
arXiv Detail & Related papers (2021-09-27T13:11:44Z)
Comparative Probing of Lexical Semantics Theories for Cognitive Plausibility and Technological Usefulness [1.028961895672321]
Lexical semantics theories differ in advocating that the meaning of words is represented as an inference graph, a feature mapping or a vector space. We systematically probe different lexical semantics theories for their levels of cognitive plausibility and of technological usefulness.
arXiv Detail & Related papers (2020-11-16T14:46:08Z)
Synonymy = Translational Equivalence [6.198307677263333]
Synonymy and translational equivalence are the relations of sameness of meaning within and across languages. This paper proposes a unifying treatment of these two relations, which is validated by experiments on existing resources.
arXiv Detail & Related papers (2020-04-28T23:15:02Z)
Human Correspondence Consensus for 3D Object Semantic Understanding [56.34297279246823]
In this paper, we introduce a new dataset named CorresPondenceNet. Based on this dataset, we are able to learn dense semantic embeddings with a novel geodesic consistency loss. We show that CorresPondenceNet could not only boost fine-grained understanding of heterogeneous objects but also cross-object registration and partial object matching.
arXiv Detail & Related papers (2019-12-29T04:24:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.