C-STS: Conditional Semantic Textual Similarity
- URL: http://arxiv.org/abs/2305.15093v2
- Date: Mon, 6 Nov 2023 18:48:43 GMT
- Title: C-STS: Conditional Semantic Textual Similarity
- Authors: Ameet Deshpande, Carlos E. Jimenez, Howard Chen, Vishvak Murahari,
Victoria Graf, Tanmay Rajpurohit, Ashwin Kalyan, Danqi Chen, Karthik
Narasimhan
- Abstract summary: We propose a novel task called Conditional STS (C-STS)
It measures sentences' similarity conditioned on a feature described in natural language (hereon, condition)
C-STS's advantages are two-fold: it reduces the subjectivity and ambiguity of STS and enables fine-grained language model evaluation through diverse natural language conditions.
- Score: 70.09137422955506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic textual similarity (STS), a cornerstone task in NLP, measures the
degree of similarity between a pair of sentences, and has broad application in
fields such as information retrieval and natural language understanding.
However, sentence similarity can be inherently ambiguous, depending on the
specific aspect of interest. We resolve this ambiguity by proposing a novel
task called Conditional STS (C-STS) which measures sentences' similarity
conditioned on an feature described in natural language (hereon, condition). As
an example, the similarity between the sentences "The NBA player shoots a
three-pointer." and "A man throws a tennis ball into the air to serve." is
higher for the condition "The motion of the ball" (both upward) and lower for
"The size of the ball" (one large and one small). C-STS's advantages are
two-fold: (1) it reduces the subjectivity and ambiguity of STS and (2) enables
fine-grained language model evaluation through diverse natural language
conditions. We put several state-of-the-art models to the test, and even those
performing well on STS (e.g. SimCSE, Flan-T5, and GPT-4) find C-STS
challenging; all with Spearman correlation scores below 50. To encourage a more
comprehensive evaluation of semantic similarity and natural language
understanding, we make nearly 19K C-STS examples and code available for others
to train and test their models.
Related papers
- Linguistically Conditioned Semantic Textual Similarity [6.049872961766425]
We reannotate the C-STS validation set and observe annotator discrepancy on 55% of the instances resulting from the annotation errors in the original label.
We present an automatic error identification pipeline that is able to identify annotation errors from the CSTS data with over 80% F1 score.
We propose a new method that largely improves the performance over baselines on the C-STS data by training the models with the answers.
arXiv Detail & Related papers (2024-06-06T01:23:45Z) - StoryAnalogy: Deriving Story-level Analogies from Large Language Models
to Unlock Analogical Understanding [72.38872974837462]
We evaluate the ability to identify and generate analogies by constructing a first-of-its-kind large-scale story-level analogy corpus.
textscStory Analogy contains 24K story pairs from diverse domains with human annotations on two similarities from the extended Structure-Mapping Theory.
We observe that the data in textscStory Analogy can improve the quality of analogy generation in large language models.
arXiv Detail & Related papers (2023-10-19T16:29:23Z) - Semantic similarity prediction is better than other semantic similarity
measures [5.176134438571082]
We argue that when we are only interested in measuring the semantic similarity, it is better to directly predict the similarity using a fine-tuned model for such a task.
Using a fine-tuned model for the Semantic Textual Similarity Benchmark tasks (STS-B) from the GLUE benchmark, we define the STSScore approach and show that the resulting similarity is better aligned with our expectations on a robust semantic similarity measure than other approaches.
arXiv Detail & Related papers (2023-09-22T08:11:01Z) - Collective Human Opinions in Semantic Textual Similarity [36.780812651679376]
We introduce USTS, the first Uncertainty-aware STS dataset with 15,000 Chinese sentence pairs and 150,000 labels.
We show that current STS models cannot capture the variance caused by human disagreement on individual instances.
arXiv Detail & Related papers (2023-08-08T08:00:52Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Identifying Ambiguous Similarity Conditions via Semantic Matching [49.06931755266372]
We introduce Weakly Supervised Conditional Similarity Learning (WS-CSL)
WS-CSL learns multiple embeddings to match semantic conditions without explicit condition labels such as "can fly"
We propose the Distance Induced Semantic COndition VERification Network (DiscoverNet), which characterizes the instance-instance and triplets-condition relations in a "decompose-and-fuse" manner.
arXiv Detail & Related papers (2022-04-08T13:15:55Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.