A Survey on Out-of-Distribution Evaluation of Neural NLP Models
- URL: http://arxiv.org/abs/2306.15261v1
- Date: Tue, 27 Jun 2023 07:44:25 GMT
- Title: A Survey on Out-of-Distribution Evaluation of Neural NLP Models
- Authors: Xinzhe Li, Ming Liu, Shang Gao and Wray Buntine
- Abstract summary: Adversarial robustness, domain generalization and dataset biases are three active lines of research contributing to out-of-distribution evaluation on neural NLP models.
In this survey, we 1) compare the three lines of research under a unifying definition; 2) summarize the data-generating processes and evaluation protocols for each line of research; and 3) emphasize the challenges and opportunities for future work.
- Score: 8.346304805498988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial robustness, domain generalization and dataset biases are three
active lines of research contributing to out-of-distribution (OOD) evaluation
on neural NLP models. However, a comprehensive, integrated discussion of the
three research lines is still lacking in the literature. In this survey, we 1)
compare the three lines of research under a unifying definition; 2) summarize
the data-generating processes and evaluation protocols for each line of
research; and 3) emphasize the challenges and opportunities for future work.
Related papers
- AAAR-1.0: Assessing AI's Potential to Assist Research [34.88341605349765]
We introduce AAAR-1.0, a benchmark dataset designed to evaluate large language models (LLMs) performance in three fundamental, expertise-intensive research tasks.
AAAR-1.0 differs from prior benchmarks in two key ways: first, it is explicitly research-oriented, with tasks requiring deep domain expertise; second, it is researcher-oriented, mirroring the primary activities that researchers engage in on a daily basis.
arXiv Detail & Related papers (2024-10-29T17:58:29Z) - Systematic Exploration of Dialogue Summarization Approaches for Reproducibility, Comparative Assessment, and Methodological Innovations for Advancing Natural Language Processing in Abstractive Summarization [0.0]
This paper delves into the reproduction and evaluation of dialogue summarization models.
Our research involved a thorough examination of several dialogue summarization models using the AMI dataset.
The primary objective was to evaluate the informativeness and quality of the summaries generated by these models through human assessment.
arXiv Detail & Related papers (2024-10-21T12:47:57Z) - A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models [43.37740735934396]
Text summarization research has undergone several significant transformations with the advent of deep neural networks, pre-trained language models (PLMs), and recent large language models (LLMs)
This survey provides a comprehensive review of the research progress and evolution in text summarization through the lens of these paradigm shifts.
arXiv Detail & Related papers (2024-06-17T07:52:32Z) - A Survey on Interpretable Cross-modal Reasoning [64.37362731950843]
Cross-modal reasoning (CMR) has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics.
This survey delves into the realm of interpretable cross-modal reasoning (I-CMR)
This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR.
arXiv Detail & Related papers (2023-09-05T05:06:48Z) - A Comprehensive Survey on Relation Extraction: Recent Advances and New Frontiers [76.51245425667845]
Relation extraction (RE) involves identifying the relations between entities from underlying content.
Deep neural networks have dominated the field of RE and made noticeable progress.
This survey is expected to facilitate researchers' collaborative efforts to address the challenges of real-world RE systems.
arXiv Detail & Related papers (2023-06-03T08:39:25Z) - A Survey on Out-of-Distribution Detection in NLP [119.80687868012393]
Out-of-distribution (OOD) detection is essential for the reliable and safe deployment of machine learning systems in the real world.
This paper presents the first review of recent advances in OOD detection with a particular focus on natural language processing approaches.
arXiv Detail & Related papers (2023-05-05T01:38:49Z) - Artificial Text Detection via Examining the Topology of Attention Maps [58.46367297712477]
We propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA)
We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets.
The probing analysis of the features reveals their sensitivity to the surface and syntactic properties.
arXiv Detail & Related papers (2021-09-10T12:13:45Z) - A Survey on Low-Resource Neural Machine Translation [106.51056217748388]
We classify related works into three categories according to the auxiliary data they used.
We hope that our survey can help researchers to better understand this field and inspire them to design better algorithms.
arXiv Detail & Related papers (2021-07-09T06:26:38Z) - A Discussion on Practical Considerations with Sparse Regression
Methodologies [0.0]
Two papers published in Statistical Science study the comparative performance of several sparse regression methodologies.
We summarize and compare the two studies and aim to provide clarity and value to users.
arXiv Detail & Related papers (2020-11-18T15:58:35Z) - Evaluation of Text Generation: A Survey [107.62760642328455]
The paper surveys evaluation methods of natural language generation systems that have been developed in the last few years.
We group NLG evaluation methods into three categories: (1) human-centric evaluation metrics, (2) automatic metrics that require no training, and (3) machine-learned metrics.
arXiv Detail & Related papers (2020-06-26T04:52:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.