'Tis but Thy Name: Semantic Question Answering Evaluation with 11M Names
for 1M Entities
- URL: http://arxiv.org/abs/2202.13581v1
- Date: Mon, 28 Feb 2022 07:12:39 GMT
- Title: 'Tis but Thy Name: Semantic Question Answering Evaluation with 11M Names
for 1M Entities
- Authors: Albert Huang
- Abstract summary: We introduce the Wiki Entity Similarity (WES) dataset, an 11M example, domain targeted, semantic entity similarity dataset that is generated from link texts in Wikipedia.
WES is tailored to QA evaluation: the examples are entities and phrases and grouped into semantic clusters to simulate multiple ground-truth labels.
Human annotators consistently agree with WES labels, and a basic cross encoder metric is better than four classic metrics at predicting human judgments of correctness.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classic lexical-matching-based QA metrics are slowly being phased out because
they punish succinct or informative outputs just because those answers were not
provided as ground truth. Recently proposed neural metrics can evaluate
semantic similarity but were trained on small textual similarity datasets
grafted from foreign domains. We introduce the Wiki Entity Similarity (WES)
dataset, an 11M example, domain targeted, semantic entity similarity dataset
that is generated from link texts in Wikipedia. WES is tailored to QA
evaluation: the examples are entities and phrases and grouped into semantic
clusters to simulate multiple ground-truth labels. Human annotators
consistently agree with WES labels, and a basic cross encoder metric is better
than four classic metrics at predicting human judgments of correctness.
Related papers
- Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - SMART: Sentences as Basic Units for Text Evaluation [48.5999587529085]
In this paper, we introduce a new metric called SMART to mitigate such limitations.
We treat sentences as basic units of matching instead of tokens, and use a sentence matching function to soft-match candidate and reference sentences.
Our results show that system-level correlations of our proposed metric with a model-based matching function outperforms all competing metrics.
arXiv Detail & Related papers (2022-08-01T17:58:05Z) - Evaluation of Semantic Answer Similarity Metrics [0.0]
We propose cross-encoder augmented bi-encoder and BERTScore models for semantic answer similarity, trained on a new dataset consisting of name pairs of US-American public figures.
We provide the first dataset of co-referent name string pairs along with their similarities, which can be used for training.
arXiv Detail & Related papers (2022-06-25T14:40:36Z) - Global Explainability of BERT-Based Evaluation Metrics by Disentangling
along Linguistic Factors [14.238125731862658]
We disentangle metric scores along linguistic factors, including semantics, syntax, morphology, and lexical overlap.
We show that the different metrics capture all aspects to some degree, but that they are all substantially sensitive to lexical overlap, just like BLEU and ROUGE.
arXiv Detail & Related papers (2021-10-08T22:40:33Z) - Semantic Answer Similarity for Evaluating Question Answering Models [2.279676596857721]
SAS is a cross-encoder-based metric for the estimation of semantic answer similarity.
We show that semantic similarity metrics based on recent transformer models correlate much better with human judgment than traditional lexical similarity metrics.
arXiv Detail & Related papers (2021-08-13T09:12:27Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic
Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching.
We first employ BERT to encode the input sentences from a global perspective.
Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective.
To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z) - PARADE: A New Dataset for Paraphrase Identification Requiring Computer
Science Domain Knowledge [35.66853329610162]
PARADE contains paraphrases that overlap very little at the lexical and syntactic level but are semantically equivalent based on computer science domain knowledge.
Experiments show that both state-of-the-art neural models and non-expert human annotators have poor performance on PARADE.
arXiv Detail & Related papers (2020-10-08T02:01:31Z) - Autoregressive Entity Retrieval [55.38027440347138]
Entities are at the center of how we represent and aggregate knowledge.
The ability to retrieve such entities given a query is fundamental for knowledge-intensive tasks such as entity linking and open-domain question answering.
We propose GENRE, the first system that retrieves entities by generating their unique names, left to right, token-by-token in an autoregressive fashion.
arXiv Detail & Related papers (2020-10-02T10:13:31Z) - Human Correspondence Consensus for 3D Object Semantic Understanding [56.34297279246823]
In this paper, we introduce a new dataset named CorresPondenceNet.
Based on this dataset, we are able to learn dense semantic embeddings with a novel geodesic consistency loss.
We show that CorresPondenceNet could not only boost fine-grained understanding of heterogeneous objects but also cross-object registration and partial object matching.
arXiv Detail & Related papers (2019-12-29T04:24:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.