QiaoNing at SemEval-2020 Task 4: Commonsense Validation and Explanation
system based on ensemble of language model
- URL: http://arxiv.org/abs/2009.02645v1
- Date: Sun, 6 Sep 2020 05:12:50 GMT
- Title: QiaoNing at SemEval-2020 Task 4: Commonsense Validation and Explanation
system based on ensemble of language model
- Authors: Pai Liu
- Abstract summary: In this paper, we present language model system submitted to SemEval-2020 Task 4 competition: "Commonsense Validation and Explanation"
We implemented with transfer learning using pretrained language models (BERT, XLNet, RoBERTa, and ALBERT) and fine-tune them on this task.
The ensembled model better solves this problem, making the model's accuracy reached 95.9% on subtask A, which just worse than human's by only 3% accuracy.
- Score: 2.728575246952532
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present language model system submitted to SemEval-2020
Task 4 competition: "Commonsense Validation and Explanation". We participate in
two subtasks for subtask A: validation and subtask B: Explanation. We
implemented with transfer learning using pretrained language models (BERT,
XLNet, RoBERTa, and ALBERT) and fine-tune them on this task. Then we compared
their characteristics in this task to help future researchers understand and
use these models more properly. The ensembled model better solves this problem,
making the model's accuracy reached 95.9% on subtask A, which just worse than
human's by only 3% accuracy.
Related papers
- The Surprising Effectiveness of Test-Time Training for Abstract Reasoning [64.36534512742736]
We investigate the effectiveness of test-time training (TTT) as a mechanism for improving models' reasoning capabilities.
TTT significantly improves performance on ARC tasks, achieving up to 6x improvement in accuracy compared to base fine-tuned models.
Our findings suggest that explicit symbolic search is not the only path to improved abstract reasoning in neural language models.
arXiv Detail & Related papers (2024-11-11T18:59:45Z) - Unify word-level and span-level tasks: NJUNLP's Participation for the
WMT2023 Quality Estimation Shared Task [59.46906545506715]
We introduce the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task.
Our team submitted predictions for the English-German language pair on all two sub-tasks.
Our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks.
arXiv Detail & Related papers (2023-09-23T01:52:14Z) - Large Language Models in the Workplace: A Case Study on Prompt
Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting.
The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z) - DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with
Gradient-Disentangled Embedding Sharing [117.41016786835452]
This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model.
vanilla embedding sharing in ELECTRA hurts training efficiency and model performance.
We propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics.
arXiv Detail & Related papers (2021-11-18T06:48:00Z) - ZJUKLAB at SemEval-2021 Task 4: Negative Augmentation with Language
Model for Reading Comprehension of Abstract Meaning [16.151203366447962]
We explain the algorithms used to learn our models and the process of tuning the algorithms and selecting the best model.
Inspired by the similarity of the ReCAM task and the language pre-training, we propose a simple yet effective technology, namely, negative augmentation with language model.
Our models achieve the 4th rank on both official test sets of Subtask 1 and Subtask 2 with an accuracy of 87.9% and an accuracy of 92.8%, respectively.
arXiv Detail & Related papers (2021-02-25T13:03:05Z) - LRG at SemEval-2021 Task 4: Improving Reading Comprehension with
Abstract Words using Augmentation, Linguistic Features and Voting [0.6850683267295249]
Given a fill-in-the-blank-type question, the task is to predict the most suitable word from a list of 5 options.
We use encoders of transformers-based models pre-trained on the masked language modelling (MLM) task to build our Fill-in-the-blank (FitB) models.
We propose variants, namely Chunk Voting and Max Context, to take care of input length restrictions for BERT, etc.
arXiv Detail & Related papers (2021-02-24T12:33:12Z) - When Can Models Learn From Explanations? A Formal Framework for
Understanding the Roles of Explanation Data [84.87772675171412]
We study the circumstances under which explanations of individual data points can improve modeling performance.
We make use of three existing datasets with explanations: e-SNLI, TACRED, SemEval.
arXiv Detail & Related papers (2021-02-03T18:57:08Z) - BUT-FIT at SemEval-2020 Task 4: Multilingual commonsense [1.433758865948252]
This paper describes work of the BUT-FIT's team at SemEval 2020 Task 4 - Commonsense Validation and Explanation.
In subtasks A and B, our submissions are based on pretrained language representation models (namely ALBERT) and data augmentation.
We experimented with solving the task for another language, Czech, by means of multilingual models and machine translated dataset.
We show that with a strong machine translation system, our system can be used in another language with a small accuracy loss.
arXiv Detail & Related papers (2020-08-17T12:45:39Z) - LMVE at SemEval-2020 Task 4: Commonsense Validation and Explanation
using Pretraining Language Model [5.428461405329692]
This paper describes our submission to subtask a and b of SemEval-2020 Task 4.
For subtask a, we use a ALBERT based model with improved input form to pick out the common sense statement from two statement candidates.
For subtask b, we use a multiple choice model enhanced by hint sentence mechanism to select the reason from given options about why a statement is against common sense.
arXiv Detail & Related papers (2020-07-06T05:51:10Z) - KaLM at SemEval-2020 Task 4: Knowledge-aware Language Models for
Comprehension And Generation [4.94950858749529]
We propose a novel way to search for evidence and choose the different large-scale pre-trained models as the backbone for three subtasks.
The results show that our evidence-searching approach improves model performance on commonsense explanation task.
arXiv Detail & Related papers (2020-05-24T15:09:21Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.