Interviewer-Candidate Role Play: Towards Developing Real-World NLP
Systems
- URL: http://arxiv.org/abs/2107.00315v1
- Date: Thu, 1 Jul 2021 09:08:43 GMT
- Title: Interviewer-Candidate Role Play: Towards Developing Real-World NLP
Systems
- Authors: Neeraj Varshney, Swaroop Mishra, Chitta Baral
- Abstract summary: We present a multi-stage task that simulates a typical human-human questioner-responder interaction such as an interview.
The system is provided with question simplifications, knowledge statements, examples, etc. at various stages to improve its prediction when it is not sufficiently confident.
We conduct comprehensive experiments and find that the multi-stage formulation of our task leads to OOD generalization performance improvement up to 2.29% in Stage 1, 1.91% in Stage 2, 54.88% in Stage 3, and 72.02% in Stage 4 over the standard unguided prediction.
- Score: 16.859554473181348
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standard NLP tasks do not incorporate several common real-world scenarios
such as seeking clarifications about the question, taking advantage of clues,
abstaining in order to avoid incorrect answers, etc. This difference in task
formulation hinders the adoption of NLP systems in real-world settings. In this
work, we take a step towards bridging this gap and present a multi-stage task
that simulates a typical human-human questioner-responder interaction such as
an interview. Specifically, the system is provided with question
simplifications, knowledge statements, examples, etc. at various stages to
improve its prediction when it is not sufficiently confident. We instantiate
the proposed task in Natural Language Inference setting where a system is
evaluated on both in-domain and out-of-domain (OOD) inputs. We conduct
comprehensive experiments and find that the multi-stage formulation of our task
leads to OOD generalization performance improvement up to 2.29% in Stage 1,
1.91% in Stage 2, 54.88% in Stage 3, and 72.02% in Stage 4 over the standard
unguided prediction. However, our task leaves a significant challenge for NLP
researchers to further improve OOD performance at each stage.
Related papers
- SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories [55.161075901665946]
Super aims to capture the realistic challenges faced by researchers working with Machine Learning (ML) and Natural Language Processing (NLP) research repositories.
Our benchmark comprises three distinct problem sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems derived from the expert set that focus on specific challenges, and 602 automatically generated problems for larger-scale development.
We show that state-of-the-art approaches struggle to solve these problems with the best model (GPT-4o) solving only 16.3% of the end-to-end set, and 46.1% of the scenarios.
arXiv Detail & Related papers (2024-09-11T17:37:48Z) - Clarify When Necessary: Resolving Ambiguity Through Interaction with LMs [58.620269228776294]
We propose a task-agnostic framework for resolving ambiguity by asking users clarifying questions.
We evaluate systems across three NLP applications: question answering, machine translation and natural language inference.
We find that intent-sim is robust, demonstrating improvements across a wide range of NLP tasks and LMs.
arXiv Detail & Related papers (2023-11-16T00:18:50Z) - A Survey of Methods for Addressing Class Imbalance in Deep-Learning
Based Natural Language Processing [68.37496795076203]
We provide guidance for NLP researchers and practitioners dealing with imbalanced data.
We first discuss various types of controlled and real-world class imbalance.
We organize the methods by whether they are based on sampling, data augmentation, choice of loss function, staged learning, or model design.
arXiv Detail & Related papers (2022-10-10T13:26:40Z) - Prompt Consistency for Zero-Shot Task Generalization [118.81196556175797]
In this paper, we explore methods to utilize unlabeled data to improve zero-shot performance.
Specifically, we take advantage of the fact that multiple prompts can be used to specify a single task, and propose to regularize prompt consistency.
Our approach outperforms the state-of-the-art zero-shot learner, T0, on 9 out of 11 datasets across 4 NLP tasks by up to 10.6 absolute points in terms of accuracy.
arXiv Detail & Related papers (2022-04-29T19:18:37Z) - PALI-NLP at SemEval-2022 Task 4: Discriminative Fine-tuning of Deep
Transformers for Patronizing and Condescending Language Detection [4.883341580669763]
We propose a novel Transformer-based model and its ensembles to accurately understand such language context for PCL detection.
To facilitate comprehension of the subtle and subjective nature of PCL, two fine-tuning strategies are applied.
The system achieves remarkable results on the official ranking, namely 1st in Subtask 1 and 5th in Subtask 2.
arXiv Detail & Related papers (2022-03-09T10:05:10Z) - AdaPrompt: Adaptive Model Training for Prompt-based NLP [77.12071707955889]
We propose AdaPrompt, adaptively retrieving external data for continual pretraining of PLMs.
Experimental results on five NLP benchmarks show that AdaPrompt can improve over standard PLMs in few-shot settings.
In zero-shot settings, our method outperforms standard prompt-based methods by up to 26.35% relative error reduction.
arXiv Detail & Related papers (2022-02-10T04:04:57Z) - Confidence-Aware Active Feedback for Efficient Instance Search [21.8172170825049]
Relevance feedback is widely used in instance search (INS) tasks to further refine imperfect ranking results.
We propose a confidence-aware active feedback (CAAF) method that can efficiently select the most valuable feedback candidates.
In particular, CAAF outperforms the first-place record in the public large-scale video INS evaluation of TRECVID 2021.
arXiv Detail & Related papers (2021-10-23T16:14:03Z) - Exploring Low-dimensional Intrinsic Task Subspace via Prompt Tuning [70.76016793057283]
In this work, we study how pre-trained language models (PLMs) learn universal representations and effectively adapt to broad NLP tasks differing a lot.
In experiments, we study diverse few-shot NLP tasks and surprisingly find that in a 5-dimensional subspace found with 100 random tasks, by only tuning 5 free parameters, we can recover 87% and 65% of the full prompt tuning performance.
arXiv Detail & Related papers (2021-10-15T05:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.