What Makes Sentences Semantically Related: A Textual Relatedness Dataset
and Empirical Study
- URL: http://arxiv.org/abs/2110.04845v4
- Date: Mon, 20 Mar 2023 13:34:47 GMT
- Title: What Makes Sentences Semantically Related: A Textual Relatedness Dataset
and Empirical Study
- Authors: Mohamed Abdalla, Krishnapriya Vishnubhotla, Saif M. Mohammad
- Abstract summary: We introduce a dataset for Semantic Textual Relatedness, STR-2022, that has 5,500 English sentence pairs manually annotated.
We show that human intuition regarding relatedness of sentence pairs is highly reliable, with a repeat annotation correlation of 0.84.
We also show the utility of STR-2022 for evaluating automatic methods of sentence representation and for various downstream NLP tasks.
- Score: 31.062129406113588
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The degree of semantic relatedness of two units of language has long been
considered fundamental to understanding meaning. Additionally, automatically
determining relatedness has many applications such as question answering and
summarization. However, prior NLP work has largely focused on semantic
similarity, a subset of relatedness, because of a lack of relatedness datasets.
In this paper, we introduce a dataset for Semantic Textual Relatedness,
STR-2022, that has 5,500 English sentence pairs manually annotated using a
comparative annotation framework, resulting in fine-grained scores. We show
that human intuition regarding relatedness of sentence pairs is highly
reliable, with a repeat annotation correlation of 0.84. We use the dataset to
explore questions on what makes sentences semantically related. We also show
the utility of STR-2022 for evaluating automatic methods of sentence
representation and for various downstream NLP tasks.
Our dataset, data statement, and annotation questionnaire can be found at:
https://doi.org/10.5281/zenodo.7599667
Related papers
- Tübingen-CL at SemEval-2024 Task 1:Ensemble Learning for Semantic Relatedness Estimation [0.0]
The paper introduces our system for SemEval-2024 Task 1, which aims to predict the relatedness of sentence pairs.
We employ an ensemble approach integrating various systems, including statistical textual features and outputs of deep learning models to predict relatedness scores.
arXiv Detail & Related papers (2024-10-14T14:56:51Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Understanding and Mitigating Spurious Correlations in Text
Classification with Neighborhood Analysis [69.07674653828565]
Machine learning models have a tendency to leverage spurious correlations that exist in the training set but may not hold true in general circumstances.
In this paper, we examine the implications of spurious correlations through a novel perspective called neighborhood analysis.
We propose a family of regularization methods, NFL (doN't Forget your Language) to mitigate spurious correlations in text classification.
arXiv Detail & Related papers (2023-05-23T03:55:50Z) - EDeR: A Dataset for Exploring Dependency Relations Between Events [12.215649447070664]
We introduce the human-annotated Event Dependency Relation dataset (EDeR)
We show that recognizing this relation leads to more accurate event extraction.
We demonstrate that predicting the three-way classification into the required argument, optional argument or non-argument is a more challenging task.
arXiv Detail & Related papers (2023-04-04T08:07:07Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - ASPER: Attention-based Approach to Extract Syntactic Patterns denoting
Semantic Relations in Sentential Context [2.175490119265481]
We propose an attention-based supervised deep learning model, ASPER, which extracts syntactic patterns between entities exhibiting a given semantic relation in the sentential context.
We validate the performance of ASPER on three distinct semantic relations -- hyponym-hypernym, cause-effect, and meronym-holonym on six datasets.
For all these semantic relations, ASPER can automatically identify a collection of syntactic patterns reflecting the existence of such a relation between a pair of entities in a sentence.
arXiv Detail & Related papers (2021-04-04T02:36:19Z) - Relation Clustering in Narrative Knowledge Graphs [71.98234178455398]
relational sentences in the original text are embedded (with SBERT) and clustered in order to merge together semantically similar relations.
Preliminary tests show that such clustering might successfully detect similar relations, and provide a valuable preprocessing for semi-supervised approaches.
arXiv Detail & Related papers (2020-11-27T10:43:04Z) - Comparative analysis of word embeddings in assessing semantic similarity
of complex sentences [8.873705500708196]
We study the sentences in existing benchmark datasets and analyze the sensitivity of various word embeddings with respect to the complexity of the sentences.
The results show the increase in complexity of the sentences has a significant impact on the performance of the embedding models.
arXiv Detail & Related papers (2020-10-23T19:55:11Z) - Learning to Decouple Relations: Few-Shot Relation Classification with
Entity-Guided Attention and Confusion-Aware Training [49.9995628166064]
We propose CTEG, a model equipped with two mechanisms to learn to decouple easily-confused relations.
On the one hand, an EGA mechanism is introduced to guide the attention to filter out information causing confusion.
On the other hand, a Confusion-Aware Training (CAT) method is proposed to explicitly learn to distinguish relations.
arXiv Detail & Related papers (2020-10-21T11:07:53Z) - Pareto Probing: Trading Off Accuracy for Complexity [87.09294772742737]
We argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance.
Our experiments with dependency parsing reveal a wide gap in syntactic knowledge between contextual and non-contextual representations.
arXiv Detail & Related papers (2020-10-05T17:27:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.