Related papers: LRG at SemEval-2021 Task 4: Improving Reading Comprehension with Abstract Words using Augmentation, Linguistic Features and Voting

LRG at SemEval-2021 Task 4: Improving Reading Comprehension with Abstract Words using Augmentation, Linguistic Features and Voting

URL: http://arxiv.org/abs/2102.12255v1
Date: Wed, 24 Feb 2021 12:33:12 GMT
Title: LRG at SemEval-2021 Task 4: Improving Reading Comprehension with Abstract Words using Augmentation, Linguistic Features and Voting
Authors: Abheesht Sharma, Harshit Pandey, Gunjan Chhablani, Yash Bhartia, Tirtharaj Dash
Abstract summary: Given a fill-in-the-blank-type question, the task is to predict the most suitable word from a list of 5 options. We use encoders of transformers-based models pre-trained on the masked language modelling (MLM) task to build our Fill-in-the-blank (FitB) models. We propose variants, namely Chunk Voting and Max Context, to take care of input length restrictions for BERT, etc.
Score: 0.6850683267295249
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this article, we present our methodologies for SemEval-2021 Task-4: Reading Comprehension of Abstract Meaning. Given a fill-in-the-blank-type question and a corresponding context, the task is to predict the most suitable word from a list of 5 options. There are three sub-tasks within this task: Imperceptibility (subtask-I), Non-Specificity (subtask-II), and Intersection (subtask-III). We use encoders of transformers-based models pre-trained on the masked language modelling (MLM) task to build our Fill-in-the-blank (FitB) models. Moreover, to model imperceptibility, we define certain linguistic features, and to model non-specificity, we leverage information from hypernyms and hyponyms provided by a lexical database. Specifically, for non-specificity, we try out augmentation techniques, and other statistical techniques. We also propose variants, namely Chunk Voting and Max Context, to take care of input length restrictions for BERT, etc. Additionally, we perform a thorough ablation study, and use Integrated Gradients to explain our predictions on a few samples. Our best submissions achieve accuracies of 75.31% and 77.84%, on the test sets for subtask-I and subtask-II, respectively. For subtask-III, we achieve accuracies of 65.64% and 62.27%.

Related papers

CLaC at SemEval-2025 Task 6: A Multi-Architecture Approach for Corporate Environmental Promise Verification [0.20482269513546458]
This paper presents our approach to the SemEval-2025 Task6 (PromiseEval), which focuses on verifying promises in corporate ESG (Environmental, Social, and Governance) reports.<n>We explore three model architectures to address the four subtasks of promise identification, supporting evidence assessment, clarity evaluation, and verification timing.<n>Our work highlights the effectiveness of linguistic feature extraction, attention pooling, and multi-objective learning in promise verification tasks, despite challenges posed by class imbalance and limited training data.
arXiv Detail & Related papers (2025-05-29T15:19:00Z)
LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models [59.0256377330646]
Lens is a benchmark with 3.4K contemporary images and 60K+ human-authored questions covering eight tasks and 12 daily scenarios.<n>This dataset intrinsically supports to evaluate MLLMs to handle image-invariable prompts, from basic perception to compositional reasoning.<n>We evaluate 15+ frontier MLLMs such as Qwen2.5-VL-72B, InternVL3-78B, GPT-4o and two reasoning models QVQ-72B-preview and Kimi-VL.
arXiv Detail & Related papers (2025-05-21T15:06:59Z)
TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale [66.01943465390548]
We introduce TriSum, a framework for distilling large language models' text summarization abilities into a compact, local model. Our method enhances local model performance on various benchmarks. It also improves interpretability by providing insights into the summarization rationale.
arXiv Detail & Related papers (2024-03-15T14:36:38Z)
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE [203.65227947509933]
This report describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard. SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks.
arXiv Detail & Related papers (2022-12-04T15:36:18Z)
Effective Cross-Task Transfer Learning for Explainable Natural Language Inference with T5 [50.574918785575655]
We compare sequential fine-tuning with a model for multi-task learning in the context of boosting performance on two tasks. Our results show that while sequential multi-task learning can be tuned to be good at the first of two target tasks, it performs less well on the second and additionally struggles with overfitting.
arXiv Detail & Related papers (2022-10-31T13:26:08Z)
Instruction Tuning for Few-Shot Aspect-Based Sentiment Analysis [72.9124467710526]
generative approaches have been proposed to extract all four elements as (one or more) quadruplets from text as a single task. We propose a unified framework for solving ABSA, and the associated sub-tasks to improve the performance in few-shot scenarios.
arXiv Detail & Related papers (2022-10-12T23:38:57Z)
Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing. Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z)
NLP-IIS@UT at SemEval-2021 Task 4: Machine Reading Comprehension using the Long Document Transformer [8.645929825516816]
This paper presents a technical report of our submission to the 4th task of SemEval-2021, titled: Reading of Abstract Meaning. In this task, we want to predict the correct answer based on a question given a context. To tackle this problem, we used the Longformer model to better process the sequences.
arXiv Detail & Related papers (2021-05-08T20:48:32Z)
ZJUKLAB at SemEval-2021 Task 4: Negative Augmentation with Language Model for Reading Comprehension of Abstract Meaning [16.151203366447962]
We explain the algorithms used to learn our models and the process of tuning the algorithms and selecting the best model. Inspired by the similarity of the ReCAM task and the language pre-training, we propose a simple yet effective technology, namely, negative augmentation with language model. Our models achieve the 4th rank on both official test sets of Subtask 1 and Subtask 2 with an accuracy of 87.9% and an accuracy of 92.8%, respectively.
arXiv Detail & Related papers (2021-02-25T13:03:05Z)
UPB at SemEval-2020 Task 6: Pretrained Language Models for Definition Extraction [0.17188280334580194]
This work presents our contribution in the context of the 6th task of SemEval-2020: Extracting Definitions from Free Text in Textbooks. We use various pretrained language models to solve each of the three subtasks of the competition. Our best performing model evaluated on the DeftEval dataset obtains the 32nd place for the first subtask and the 37th place for the second subtask.
arXiv Detail & Related papers (2020-09-11T18:36:22Z)
QiaoNing at SemEval-2020 Task 4: Commonsense Validation and Explanation system based on ensemble of language model [2.728575246952532]
In this paper, we present language model system submitted to SemEval-2020 Task 4 competition: "Commonsense Validation and Explanation" We implemented with transfer learning using pretrained language models (BERT, XLNet, RoBERTa, and ALBERT) and fine-tune them on this task. The ensembled model better solves this problem, making the model's accuracy reached 95.9% on subtask A, which just worse than human's by only 3% accuracy.
arXiv Detail & Related papers (2020-09-06T05:12:50Z)
GUIR at SemEval-2020 Task 12: Domain-Tuned Contextualized Models for Offensive Language Detection [27.45642971636561]
OffensEval 2020 task includes three English sub-tasks: identifying the presence of offensive language (Sub-task A), identifying the presence of target in offensive language (Sub-task B), and identifying the categories of the target (Sub-task C) Our submissions achieve F1 scores of 91.7% in Sub-task A, 66.5% in Sub-task B, and 63.2% in Sub-task C.
arXiv Detail & Related papers (2020-07-28T20:45:43Z)
Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions [87.33156149634392]
We critically examine RefCOg, a standard benchmark for visual referring expression recognition. We show that 83.7% of test instances do not require reasoning on linguistic structure. We propose two methods, one based on contrastive learning and the other based on multi-task learning, to increase the robustness of ViLBERT.
arXiv Detail & Related papers (2020-05-04T17:09:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.