SemEval-2020 Task 4: Commonsense Validation and Explanation
- URL: http://arxiv.org/abs/2007.00236v2
- Date: Mon, 3 Aug 2020 15:13:40 GMT
- Title: SemEval-2020 Task 4: Commonsense Validation and Explanation
- Authors: Cunxiang Wang, Shuailong Liang, Yili Jin, Yilong Wang, Xiaodan Zhu and
Yue Zhang
- Abstract summary: SemEval-2020 Task 4, Commonsense Validation and Explanation (ComVE), includes three subtasks.
We aim to evaluate whether a system can distinguish a natural language statement that makes sense to humans from one that does not.
For Subtask A and Subtask B, the performances of top-ranked systems are close to that of humans.
- Score: 24.389998904122244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present SemEval-2020 Task 4, Commonsense Validation and
Explanation (ComVE), which includes three subtasks, aiming to evaluate whether
a system can distinguish a natural language statement that makes sense to
humans from one that does not, and provide the reasons. Specifically, in our
first subtask, the participating systems are required to choose from two
natural language statements of similar wording the one that makes sense and the
one does not. The second subtask additionally asks a system to select the key
reason from three options why a given statement does not make sense. In the
third subtask, a participating system needs to generate the reason. We finally
attracted 39 teams participating at least one of the three subtasks. For
Subtask A and Subtask B, the performances of top-ranked systems are close to
that of humans. However, for Subtask C, there is still a relatively large gap
between systems and human performance. The dataset used in our task can be
found at https://github.com/wangcunxiang/SemEval2020-
Task4-Commonsense-Validation-and-Explanation; The leaderboard can be found at
https://competitions.codalab.org/competitions/21080#results.
Related papers
- SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection [68.858931667807]
Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine.
Subtask B is to detect the exact source of a text, discerning whether it is written by a human or generated by a specific LLM.
Subtask C aims to identify the changing point within a text, at which the authorship transitions from human to machine.
arXiv Detail & Related papers (2024-04-22T13:56:07Z) - STREET: A Multi-Task Structured Reasoning and Explanation Benchmark [56.555662318619135]
We introduce a unified multi-task and multi-domain natural language reasoning and explanation benchmark.
We expect models to not only answer questions, but also produce step-by-step structured explanations describing how premises in the question are used to produce intermediate conclusions that can prove the correctness of a certain answer.
arXiv Detail & Related papers (2023-02-13T22:34:02Z) - Findings of the WMT 2022 Shared Task on Translation Suggestion [63.457874930232926]
We report the result of the first edition of the WMT shared task on Translation Suggestion.
The task aims to provide alternatives for specific words or phrases given the entire documents generated by machine translation (MT)
It consists two sub-tasks, namely, the naive translation suggestion and translation suggestion with hints.
arXiv Detail & Related papers (2022-11-30T03:48:36Z) - Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on
Spoken Language Understanding [101.24748444126982]
Decomposable tasks are complex and comprise of a hierarchy of sub-tasks.
Existing benchmarks, however, typically hold out examples for only the surface-level sub-task.
We propose a framework to construct robust test sets using coordinate ascent over sub-task specific utility functions.
arXiv Detail & Related papers (2021-06-29T02:53:59Z) - SemEval-2021 Task 4: Reading Comprehension of Abstract Meaning [47.49596196559958]
This paper introduces the SemEval-2021 shared task 4: Reading of Abstract Meaning (ReCAM)
Given a passage and the corresponding question, a participating system is expected to choose the correct answer from five candidates of abstract concepts.
Subtask 1 aims to evaluate how well a system can model concepts that cannot be directly perceived in the physical world.
Subtask 2 focuses on models' ability in comprehending nonspecific concepts located high in a hypernym hierarchy.
Subtask 3 aims to provide some insights into models' generalizability over the two types of abstractness.
arXiv Detail & Related papers (2021-05-31T11:04:17Z) - ISCAS at SemEval-2020 Task 5: Pre-trained Transformers for
Counterfactual Statement Modeling [48.3669727720486]
ISCAS participated in two subtasks of SemEval 2020 Task 5: detecting counterfactual statements and detecting antecedent and consequence.
This paper describes our system which is based on pre-trained transformers.
arXiv Detail & Related papers (2020-09-17T09:28:07Z) - SemEval-2020 Task 5: Counterfactual Recognition [36.38097292055921]
Subtask-1 aims to determine whether a given sentence is a counterfactual statement or not.
Subtask-2 requires the participating systems to extract the antecedent and consequent in a given counterfactual statement.
arXiv Detail & Related papers (2020-08-02T20:32:19Z) - CS-NET at SemEval-2020 Task 4: Siamese BERT for ComVE [2.0491741153610334]
This paper describes a system for distinguishing between statements that confirm to common sense and those that do not.
We use a parallel instance of transformers, which is responsible for a boost in the performance.
We achieved an accuracy of 94.8% in subtask A and 89% in subtask B on the test set.
arXiv Detail & Related papers (2020-07-21T14:08:02Z) - CUHK at SemEval-2020 Task 4: CommonSense Explanation, Reasoning and
Prediction with Multi-task Learning [22.534520584497503]
This paper describes our system submitted to task 4 of SemEval 2020: Commonsense Validation and Explanation (ComVE)
The task is to directly validate the given sentence whether or not it makes sense and require the model to explain it.
Based on BERTarchitecture with a multi-task setting, we propose an effective and interpretable "Explain, Reason and Predict" (ERP) system.
arXiv Detail & Related papers (2020-06-12T13:51:12Z) - CS-NLP team at SemEval-2020 Task 4: Evaluation of State-of-the-art NLP
Deep Learning Architectures on Commonsense Reasoning Task [3.058685580689605]
We describe our attempt at SemEval-2020 Task 4 competition: Commonsense Validation and Explanation (ComVE) challenge.
Our system uses prepared labeled textual datasets that were manually curated for three different natural language inference subtasks.
For the second subtask, which is to select the reason why a statement does not make sense, we stand within the first six teams (93.7%) among 27 participants with very competitive results.
arXiv Detail & Related papers (2020-05-17T13:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.