Related papers: Evaluating NLP Systems On a Novel Cloze Task: Judging the Plausibility of Possible Fillers in Instructional Texts

Evaluating NLP Systems On a Novel Cloze Task: Judging the Plausibility of Possible Fillers in Instructional Texts

URL: http://arxiv.org/abs/2112.01867v1
Date: Fri, 3 Dec 2021 12:02:52 GMT
Title: Evaluating NLP Systems On a Novel Cloze Task: Judging the Plausibility of Possible Fillers in Instructional Texts
Authors: Zizhao Hu, Ravikiran Chanumolu, Xingyu Lin, Nayela Ayaz, Vincent Chi
Abstract summary: Cloze task is a widely used task to evaluate an NLP system's language understanding ability. New task is proposed: predicting if a filler word in a cloze task is a good, neutral, or bad candidate.
Score: 2.3449131636069898
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cloze task is a widely used task to evaluate an NLP system's language understanding ability. However, most of the existing cloze tasks only require NLP systems to give the relative best prediction for each input data sample, rather than the absolute quality of all possible predictions, in a consistent way across the input domain. Thus a new task is proposed: predicting if a filler word in a cloze task is a good, neutral, or bad candidate. Complicated versions can be extended to predicting more discrete classes or continuous scores. We focus on subtask A in Semeval 2022 task 7, explored some possible architectures to solve this new task, provided a detailed comparison of them, and proposed an ensemble method to improve traditional models in this new task.

Related papers

Task Priors: Enhancing Model Evaluation by Considering the Entire Space of Downstream Tasks [13.412573082645096]
We argue that such a rigid evaluation protocol creates a silent bottleneck in AI research.<n>Under this view, one can evaluate a model's performance over the set of all possible downstream tasks.
arXiv Detail & Related papers (2025-07-14T02:53:14Z)
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z)
Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks [101.40633115037983]
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions. How to select new tasks to improve the performance and generalizability of IT models remains an open question. We propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks.
arXiv Detail & Related papers (2023-11-01T04:40:05Z)
Class Incremental Learning via Likelihood Ratio Based Task Prediction [20.145128455767587]
An emerging theory-guided approach is to train a task-specific model for each task in a shared network for all tasks. This paper argues that using a traditional OOD detector for task-id prediction is sub-optimal because additional information can be exploited. We call the new method TPL (Task-id Prediction based on Likelihood Ratio) It markedly outperforms strong CIL baselines and has negligible catastrophic forgetting.
arXiv Detail & Related papers (2023-09-26T16:25:57Z)
Improving Task Generalization via Unified Schema Prompt [87.31158568180514]
Unified Prompt is a flexible and prompting method, which automatically customizes the learnable prompts for each task according to the task input schema. It models the shared knowledge between tasks, while keeping the characteristics of different task schema. The framework achieves strong zero-shot and few-shot performance on 16 unseen tasks downstream from 8 task types.
arXiv Detail & Related papers (2022-08-05T15:26:36Z)
KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in Few-Shot NLP [68.43279384561352]
Existing data augmentation algorithms leverage task-independent rules or fine-tune general-purpose pre-trained language models. These methods have trivial task-specific knowledge and are limited to yielding low-quality synthetic data for weak baselines in simple tasks. We propose the Knowledge Mixture Data Augmentation Model (KnowDA): an encoder-decoder LM pretrained on a mixture of diverse NLP tasks.
arXiv Detail & Related papers (2022-06-21T11:34:02Z)
Prompt Consistency for Zero-Shot Task Generalization [118.81196556175797]
In this paper, we explore methods to utilize unlabeled data to improve zero-shot performance. Specifically, we take advantage of the fact that multiple prompts can be used to specify a single task, and propose to regularize prompt consistency. Our approach outperforms the state-of-the-art zero-shot learner, T0, on 9 out of 11 datasets across 4 NLP tasks by up to 10.6 absolute points in terms of accuracy.
arXiv Detail & Related papers (2022-04-29T19:18:37Z)
Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification. Our key idea is to represent each task using gradient information from a base model. Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z)
Learning from Task Descriptions [24.588252048132862]
We introduce a framework for developing NLP systems that solve new tasks after reading their descriptions. We instantiate this framework with a new English language dataset, ZEST, structured for task-oriented evaluation. We find that the state-of-the-art T5 model achieves a score of 12% on ZEST, leaving a significant challenge for NLP researchers.
arXiv Detail & Related papers (2020-11-16T17:25:24Z)
Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time. We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.