LRG at SemEval-2021 Task 4: Improving Reading Comprehension with
Abstract Words using Augmentation, Linguistic Features and Voting
- URL: http://arxiv.org/abs/2102.12255v1
- Date: Wed, 24 Feb 2021 12:33:12 GMT
- Title: LRG at SemEval-2021 Task 4: Improving Reading Comprehension with
Abstract Words using Augmentation, Linguistic Features and Voting
- Authors: Abheesht Sharma, Harshit Pandey, Gunjan Chhablani, Yash Bhartia,
Tirtharaj Dash
- Abstract summary: Given a fill-in-the-blank-type question, the task is to predict the most suitable word from a list of 5 options.
We use encoders of transformers-based models pre-trained on the masked language modelling (MLM) task to build our Fill-in-the-blank (FitB) models.
We propose variants, namely Chunk Voting and Max Context, to take care of input length restrictions for BERT, etc.
- Score: 0.6850683267295249
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this article, we present our methodologies for SemEval-2021 Task-4:
Reading Comprehension of Abstract Meaning. Given a fill-in-the-blank-type
question and a corresponding context, the task is to predict the most suitable
word from a list of 5 options. There are three sub-tasks within this task:
Imperceptibility (subtask-I), Non-Specificity (subtask-II), and Intersection
(subtask-III). We use encoders of transformers-based models pre-trained on the
masked language modelling (MLM) task to build our Fill-in-the-blank (FitB)
models. Moreover, to model imperceptibility, we define certain linguistic
features, and to model non-specificity, we leverage information from hypernyms
and hyponyms provided by a lexical database. Specifically, for non-specificity,
we try out augmentation techniques, and other statistical techniques. We also
propose variants, namely Chunk Voting and Max Context, to take care of input
length restrictions for BERT, etc. Additionally, we perform a thorough ablation
study, and use Integrated Gradients to explain our predictions on a few
samples. Our best submissions achieve accuracies of 75.31% and 77.84%, on the
test sets for subtask-I and subtask-II, respectively. For subtask-III, we
achieve accuracies of 65.64% and 62.27%.
Related papers
- TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale [66.01943465390548]
We introduce TriSum, a framework for distilling large language models' text summarization abilities into a compact, local model.
Our method enhances local model performance on various benchmarks.
It also improves interpretability by providing insights into the summarization rationale.
arXiv Detail & Related papers (2024-03-15T14:36:38Z) - Toward Efficient Language Model Pretraining and Downstream Adaptation
via Self-Evolution: A Case Study on SuperGLUE [203.65227947509933]
This report describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard.
SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks.
arXiv Detail & Related papers (2022-12-04T15:36:18Z) - Effective Cross-Task Transfer Learning for Explainable Natural Language
Inference with T5 [50.574918785575655]
We compare sequential fine-tuning with a model for multi-task learning in the context of boosting performance on two tasks.
Our results show that while sequential multi-task learning can be tuned to be good at the first of two target tasks, it performs less well on the second and additionally struggles with overfitting.
arXiv Detail & Related papers (2022-10-31T13:26:08Z) - Instruction Tuning for Few-Shot Aspect-Based Sentiment Analysis [72.9124467710526]
generative approaches have been proposed to extract all four elements as (one or more) quadruplets from text as a single task.
We propose a unified framework for solving ABSA, and the associated sub-tasks to improve the performance in few-shot scenarios.
arXiv Detail & Related papers (2022-10-12T23:38:57Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - NLP-IIS@UT at SemEval-2021 Task 4: Machine Reading Comprehension using
the Long Document Transformer [8.645929825516816]
This paper presents a technical report of our submission to the 4th task of SemEval-2021, titled: Reading of Abstract Meaning.
In this task, we want to predict the correct answer based on a question given a context.
To tackle this problem, we used the Longformer model to better process the sequences.
arXiv Detail & Related papers (2021-05-08T20:48:32Z) - ZJUKLAB at SemEval-2021 Task 4: Negative Augmentation with Language
Model for Reading Comprehension of Abstract Meaning [16.151203366447962]
We explain the algorithms used to learn our models and the process of tuning the algorithms and selecting the best model.
Inspired by the similarity of the ReCAM task and the language pre-training, we propose a simple yet effective technology, namely, negative augmentation with language model.
Our models achieve the 4th rank on both official test sets of Subtask 1 and Subtask 2 with an accuracy of 87.9% and an accuracy of 92.8%, respectively.
arXiv Detail & Related papers (2021-02-25T13:03:05Z) - UPB at SemEval-2020 Task 6: Pretrained Language Models for Definition
Extraction [0.17188280334580194]
This work presents our contribution in the context of the 6th task of SemEval-2020: Extracting Definitions from Free Text in Textbooks.
We use various pretrained language models to solve each of the three subtasks of the competition.
Our best performing model evaluated on the DeftEval dataset obtains the 32nd place for the first subtask and the 37th place for the second subtask.
arXiv Detail & Related papers (2020-09-11T18:36:22Z) - QiaoNing at SemEval-2020 Task 4: Commonsense Validation and Explanation
system based on ensemble of language model [2.728575246952532]
In this paper, we present language model system submitted to SemEval-2020 Task 4 competition: "Commonsense Validation and Explanation"
We implemented with transfer learning using pretrained language models (BERT, XLNet, RoBERTa, and ALBERT) and fine-tune them on this task.
The ensembled model better solves this problem, making the model's accuracy reached 95.9% on subtask A, which just worse than human's by only 3% accuracy.
arXiv Detail & Related papers (2020-09-06T05:12:50Z) - GUIR at SemEval-2020 Task 12: Domain-Tuned Contextualized Models for
Offensive Language Detection [27.45642971636561]
OffensEval 2020 task includes three English sub-tasks: identifying the presence of offensive language (Sub-task A), identifying the presence of target in offensive language (Sub-task B), and identifying the categories of the target (Sub-task C)
Our submissions achieve F1 scores of 91.7% in Sub-task A, 66.5% in Sub-task B, and 63.2% in Sub-task C.
arXiv Detail & Related papers (2020-07-28T20:45:43Z) - Words aren't enough, their order matters: On the Robustness of Grounding
Visual Referring Expressions [87.33156149634392]
We critically examine RefCOg, a standard benchmark for visual referring expression recognition.
We show that 83.7% of test instances do not require reasoning on linguistic structure.
We propose two methods, one based on contrastive learning and the other based on multi-task learning, to increase the robustness of ViLBERT.
arXiv Detail & Related papers (2020-05-04T17:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.