Reading Comprehension as Natural Language Inference: A Semantic Analysis
- URL: http://arxiv.org/abs/2010.01713v1
- Date: Sun, 4 Oct 2020 22:50:59 GMT
- Title: Reading Comprehension as Natural Language Inference: A Semantic Analysis
- Authors: Anshuman Mishra, Dhruvesh Patel, Aparna Vijayakumar, Xiang Li, Pavan
Kapanipathi, Kartik Talamadupula
- Abstract summary: We explore the utility of Natural language Inference (NLI) for Question Answering (QA)
We transform the one of the largest available MRC dataset (RACE) to an NLI form, and compare the performances of a state-of-the-art model (RoBERTa) on both forms.
We highlight clear categories for which the model is able to perform better when the data is presented in a coherent entailment form, and a structured question-answer concatenation form.
- Score: 15.624486319943015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the recent past, Natural language Inference (NLI) has gained significant
attention, particularly given its promise for downstream NLP tasks. However,
its true impact is limited and has not been well studied. Therefore, in this
paper, we explore the utility of NLI for one of the most prominent downstream
tasks, viz. Question Answering (QA). We transform the one of the largest
available MRC dataset (RACE) to an NLI form, and compare the performances of a
state-of-the-art model (RoBERTa) on both these forms. We propose new
characterizations of questions, and evaluate the performance of QA and NLI
models on these categories. We highlight clear categories for which the model
is able to perform better when the data is presented in a coherent entailment
form, and a structured question-answer concatenation form, respectively.
Related papers
- Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - With a Little Push, NLI Models can Robustly and Efficiently Predict
Faithfulness [19.79160738554967]
Conditional language models still generate unfaithful output that is not supported by their input.
We show that pure NLI models can outperform more complex metrics when combining task-adaptive data augmentation with robust inference procedures.
arXiv Detail & Related papers (2023-05-26T11:00:04Z) - Compositional Exemplars for In-context Learning [21.961094715261133]
Large pretrained language models (LMs) have shown impressive In-Context Learning (ICL) ability.
We propose CEIL (Compositional Exemplars for In-context Learning) to model the interaction between the given input and in-context examples.
We validate CEIL on 12 classification and generation datasets from 7 distinct NLP tasks, including sentiment analysis, paraphrase detection, natural language inference, commonsense reasoning, open-domain question answering, code generation, and semantic parsing.
arXiv Detail & Related papers (2023-02-11T14:02:08Z) - Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED)
Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title.
In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it.
We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z) - Stretching Sentence-pair NLI Models to Reason over Long Documents and
Clusters [35.103851212995046]
Natural Language Inference (NLI) has been extensively studied by the NLP community as a framework for estimating the semantic relation between sentence pairs.
We explore the direct zero-shot applicability of NLI models to real applications, beyond the sentence-pair setting they were trained on.
We develop new aggregation methods to allow operating over full documents, reaching state-of-the-art performance on the ContractNLI dataset.
arXiv Detail & Related papers (2022-04-15T12:56:39Z) - DocNLI: A Large-scale Dataset for Document-level Natural Language
Inference [55.868482696821815]
Natural language inference (NLI) is formulated as a unified framework for solving various NLP problems.
This work presents DocNLI -- a newly-constructed large-scale dataset for document-level NLI.
arXiv Detail & Related papers (2021-06-17T13:02:26Z) - Looking Beyond Sentence-Level Natural Language Inference for Downstream
Tasks [15.624486319943015]
In recent years, the Natural Language Inference (NLI) task has garnered significant attention.
We study this unfulfilled promise from the lens of two downstream tasks: question answering (QA), and text summarization.
We conjecture that a key difference between the NLI datasets and these downstream tasks concerns the length of the premise.
arXiv Detail & Related papers (2020-09-18T21:44:35Z) - Ranking Clarification Questions via Natural Language Inference [25.433933534561568]
Given a natural language query, teaching machines to ask clarifying questions is of immense utility in practical natural language processing systems.
For the task of ranking clarification questions, we hypothesize that determining whether a clarification question pertains to a missing entry in a given post could be considered as a special case of Natural Language Inference (NLI)
We validate this hypothesis by incorporating representations from a Siamese BERT model fine-tuned on NLI and Multi-NLI datasets into our models.
arXiv Detail & Related papers (2020-08-18T01:32:29Z) - Coreferential Reasoning Learning for Language Representation [88.14248323659267]
We present CorefBERT, a novel language representation model that can capture the coreferential relations in context.
The experimental results show that, compared with existing baseline models, CorefBERT can achieve significant improvements consistently on various downstream NLP tasks.
arXiv Detail & Related papers (2020-04-15T03:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.