Related papers: QiaoNing at SemEval-2020 Task 4: Commonsense Validation and Explanation system based on ensemble of language model

QiaoNing at SemEval-2020 Task 4: Commonsense Validation and Explanation system based on ensemble of language model

URL: http://arxiv.org/abs/2009.02645v1
Date: Sun, 6 Sep 2020 05:12:50 GMT
Title: QiaoNing at SemEval-2020 Task 4: Commonsense Validation and Explanation system based on ensemble of language model
Authors: Pai Liu
Abstract summary: In this paper, we present language model system submitted to SemEval-2020 Task 4 competition: "Commonsense Validation and Explanation" We implemented with transfer learning using pretrained language models (BERT, XLNet, RoBERTa, and ALBERT) and fine-tune them on this task. The ensembled model better solves this problem, making the model's accuracy reached 95.9% on subtask A, which just worse than human's by only 3% accuracy.
Score: 2.728575246952532
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we present language model system submitted to SemEval-2020 Task 4 competition: "Commonsense Validation and Explanation". We participate in two subtasks for subtask A: validation and subtask B: Explanation. We implemented with transfer learning using pretrained language models (BERT, XLNet, RoBERTa, and ALBERT) and fine-tune them on this task. Then we compared their characteristics in this task to help future researchers understand and use these models more properly. The ensembled model better solves this problem, making the model's accuracy reached 95.9% on subtask A, which just worse than human's by only 3% accuracy.

Related papers

Zero-Shot Commonsense Validation and Reasoning with Large Language Models: An Evaluation on SemEval-2020 Task 4 Dataset [0.16385815610837165]
This study evaluates the performance of Large Language Models (LLMs) on SemEval-2020 Task 4 dataset. The models are tested on two tasks: Task A (Commonsense Validation), where models determine whether a statement aligns with commonsense knowledge, and Task B (Commonsense Explanation) Results indicate that larger models outperform previous models, with LLaMA3-70B achieving the highest accuracy of 98.40% in Task A whereas, lagging behind previous models with 93.40% in Task B.
arXiv Detail & Related papers (2025-02-19T12:40:49Z)
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning [64.36534512742736]
We investigate the effectiveness of test-time training (TTT) as a mechanism for improving models' reasoning capabilities. TTT significantly improves performance on ARC tasks, achieving up to 6x improvement in accuracy compared to base fine-tuned models. Our findings suggest that explicit symbolic search is not the only path to improved abstract reasoning in neural language models.
arXiv Detail & Related papers (2024-11-11T18:59:45Z)
Unify word-level and span-level tasks: NJUNLP's Participation for the WMT2023 Quality Estimation Shared Task [59.46906545506715]
We introduce the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task. Our team submitted predictions for the English-German language pair on all two sub-tasks. Our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks.
arXiv Detail & Related papers (2023-09-23T01:52:14Z)
Large Language Models in the Workplace: A Case Study on Prompt Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting. The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z)
Few-shot Learning with Multilingual Language Models [66.49496434282564]
We train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages. Our largest model sets new state of the art in few-shot learning in more than 20 representative languages. We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning.
arXiv Detail & Related papers (2021-12-20T16:52:35Z)
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing [117.41016786835452]
This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model. vanilla embedding sharing in ELECTRA hurts training efficiency and model performance. We propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics.
arXiv Detail & Related papers (2021-11-18T06:48:00Z)
ZJUKLAB at SemEval-2021 Task 4: Negative Augmentation with Language Model for Reading Comprehension of Abstract Meaning [16.151203366447962]
We explain the algorithms used to learn our models and the process of tuning the algorithms and selecting the best model. Inspired by the similarity of the ReCAM task and the language pre-training, we propose a simple yet effective technology, namely, negative augmentation with language model. Our models achieve the 4th rank on both official test sets of Subtask 1 and Subtask 2 with an accuracy of 87.9% and an accuracy of 92.8%, respectively.
arXiv Detail & Related papers (2021-02-25T13:03:05Z)
LRG at SemEval-2021 Task 4: Improving Reading Comprehension with Abstract Words using Augmentation, Linguistic Features and Voting [0.6850683267295249]
Given a fill-in-the-blank-type question, the task is to predict the most suitable word from a list of 5 options. We use encoders of transformers-based models pre-trained on the masked language modelling (MLM) task to build our Fill-in-the-blank (FitB) models. We propose variants, namely Chunk Voting and Max Context, to take care of input length restrictions for BERT, etc.
arXiv Detail & Related papers (2021-02-24T12:33:12Z)
When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data [84.87772675171412]
We study the circumstances under which explanations of individual data points can improve modeling performance. We make use of three existing datasets with explanations: e-SNLI, TACRED, SemEval.
arXiv Detail & Related papers (2021-02-03T18:57:08Z)
BUT-FIT at SemEval-2020 Task 4: Multilingual commonsense [1.433758865948252]
This paper describes work of the BUT-FIT's team at SemEval 2020 Task 4 - Commonsense Validation and Explanation. In subtasks A and B, our submissions are based on pretrained language representation models (namely ALBERT) and data augmentation. We experimented with solving the task for another language, Czech, by means of multilingual models and machine translated dataset. We show that with a strong machine translation system, our system can be used in another language with a small accuracy loss.
arXiv Detail & Related papers (2020-08-17T12:45:39Z)
LMVE at SemEval-2020 Task 4: Commonsense Validation and Explanation using Pretraining Language Model [5.428461405329692]
This paper describes our submission to subtask a and b of SemEval-2020 Task 4. For subtask a, we use a ALBERT based model with improved input form to pick out the common sense statement from two statement candidates. For subtask b, we use a multiple choice model enhanced by hint sentence mechanism to select the reason from given options about why a statement is against common sense.
arXiv Detail & Related papers (2020-07-06T05:51:10Z)
KaLM at SemEval-2020 Task 4: Knowledge-aware Language Models for Comprehension And Generation [4.94950858749529]
We propose a novel way to search for evidence and choose the different large-scale pre-trained models as the backbone for three subtasks. The results show that our evidence-searching approach improves model performance on commonsense explanation task.
arXiv Detail & Related papers (2020-05-24T15:09:21Z)
Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models. Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.