GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and
Linguistic Evaluation
- URL: http://arxiv.org/abs/2306.01966v2
- Date: Fri, 22 Sep 2023 03:31:17 GMT
- Title: GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and
Linguistic Evaluation
- Authors: Tatsuya Aoyama, Shabnam Behzad, Luke Gessler, Lauren Levine, Jessica
Lin, Yang Janet Liu, Siyao Peng, Yilun Zhu, Amir Zeldes
- Abstract summary: We present GENTLE, a new mixed-genre English challenge corpus totaling 17K tokens.
GENTLE is manually annotated for a variety of popular NLP tasks.
We evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for at least some genres in their performance on all tasks.
- Score: 15.886585212606787
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present GENTLE, a new mixed-genre English challenge corpus totaling 17K
tokens and consisting of 8 unusual text types for out-of domain evaluation:
dictionary entries, esports commentaries, legal documents, medical notes,
poetry, mathematical proofs, syllabuses, and threat letters. GENTLE is manually
annotated for a variety of popular NLP tasks, including syntactic dependency
parsing, entity recognition, coreference resolution, and discourse parsing. We
evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for
at least some genres in their performance on all tasks, which indicates
GENTLE's utility as an evaluation dataset for NLP systems.
Related papers
- DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages [49.38663048447942]
We propose DIALECTBENCH, the first-ever large-scale benchmark for NLP on varieties.
This allows for a comprehensive evaluation of NLP system performance on different language varieties.
We provide substantial evidence of performance disparities between standard and non-standard language varieties.
arXiv Detail & Related papers (2024-03-16T20:18:36Z) - Text Categorization Can Enhance Domain-Agnostic Stopword Extraction [3.6048839315645442]
This paper investigates the role of text categorization in streamlining stopword extraction in natural language processing (NLP)
By leveraging the MasakhaNEWS, African Stopwords Project, and MasakhaPOS datasets, our findings emphasize that text categorization effectively identifies domain-agnostic stopwords with over 80% detection success rate for most examined languages.
arXiv Detail & Related papers (2024-01-24T11:52:05Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Gen-Z: Generative Zero-Shot Text Classification with Contextualized
Label Descriptions [50.92702206798324]
We propose a generative prompting framework for zero-shot text classification.
GEN-Z measures the LM likelihood of input text conditioned on natural language descriptions of labels.
We show that zero-shot classification with simple contextualization of the data source consistently outperforms both zero-shot and few-shot baselines.
arXiv Detail & Related papers (2023-11-13T07:12:57Z) - MISMATCH: Fine-grained Evaluation of Machine-generated Text with
Mismatch Error Types [68.76742370525234]
We propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts.
Inspired by the recent efforts in several NLP tasks for fine-grained evaluation, we introduce a set of 13 mismatch error types.
We show that the mismatch errors between the sentence pairs on the held-out datasets from 7 NLP tasks align well with the human evaluation.
arXiv Detail & Related papers (2023-06-18T01:38:53Z) - CLSE: Corpus of Linguistically Significant Entities [58.29901964387952]
We release a Corpus of Linguistically Significant Entities (CLSE) annotated by experts.
CLSE covers 74 different semantic types to support various applications from airline ticketing to video games.
We create a linguistically representative NLG evaluation benchmark in three languages: French, Marathi, and Russian.
arXiv Detail & Related papers (2022-11-04T12:56:12Z) - Stylistic Fingerprints, POS-tags and Inflected Languages: A Case Study
in Polish [0.0]
Inflected languages make word forms sparse, making most statistical procedures complicated.
This paper examines the usefulness of grammatical features (as assessed via POS-tag n-grams) and lemmatized forms in recognizing author stylial profiles.
arXiv Detail & Related papers (2022-06-05T15:48:16Z) - More Than Words: Collocation Tokenization for Latent Dirichlet
Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ.
We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z) - Representing Numbers in NLP: a Survey and a Vision [15.035458171592191]
We arrange recent NLP work on numeracy into a comprehensive taxonomy of tasks and methods.
We analyze the myriad representational choices made by 18 previously published number encoders and decoders.
We synthesize best practices for representing numbers in text and articulate a vision for holistic numeracy in NLP.
arXiv Detail & Related papers (2021-03-24T12:28:22Z) - Meta-Embeddings for Natural Language Inference and Semantic Similarity
tasks [0.0]
Word Representations form the core component for almost all advanced Natural Language Processing (NLP) applications.
In this paper, we propose to use Meta Embedding derived from few State-of-the-Art (SOTA) models to efficiently tackle mainstream NLP tasks.
arXiv Detail & Related papers (2020-12-01T16:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.