GLGE: A New General Language Generation Evaluation Benchmark
- URL: http://arxiv.org/abs/2011.11928v3
- Date: Tue, 1 Jun 2021 08:01:50 GMT
- Title: GLGE: A New General Language Generation Evaluation Benchmark
- Authors: Dayiheng Liu, Yu Yan, Yeyun Gong, Weizhen Qi, Hang Zhang, Jian Jiao,
Weizhu Chen, Jie Fu, Linjun Shou, Ming Gong, Pengcheng Wang, Jiusheng Chen,
Daxin Jiang, Jiancheng Lv, Ruofei Zhang, Winnie Wu, Ming Zhou, Nan Duan
- Abstract summary: General Language Generation Evaluation (GLGE) is a new multi-task benchmark for evaluating the generalization capabilities of NLG models.
To encourage research on pretraining and transfer learning on NLG models, we make GLGE publicly available and build a leaderboard with strong baselines.
- Score: 139.25515221280767
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress
of pretraining and transfer learning in Natural Language Processing (NLP).
These benchmarks mostly focus on a range of Natural Language Understanding
(NLU) tasks, without considering the Natural Language Generation (NLG) models.
In this paper, we present the General Language Generation Evaluation (GLGE), a
new multi-task benchmark for evaluating the generalization capabilities of NLG
models across eight language generation tasks. For each task, we continue to
design three subtasks in terms of task difficulty (GLGE-Easy, GLGE-Medium, and
GLGE-Hard). This introduces 24 subtasks to comprehensively compare model
performance. To encourage research on pretraining and transfer learning on NLG
models, we make GLGE publicly available and build a leaderboard with strong
baselines including MASS, BART, and ProphetNet (The source code and dataset are
publicly available at https://github.com/microsoft/glge).
Related papers
- UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions [64.50935101415776]
We build a single model that jointly performs various spoken language understanding (SLU) tasks.
We demonstrate the efficacy of our single multi-task learning model "UniverSLU" for 12 speech classification and sequence generation task types spanning 17 datasets and 9 languages.
arXiv Detail & Related papers (2023-10-04T17:10:23Z) - bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark [28.472036496534116]
bgGLUE is a benchmark for evaluating language models on Natural Language Understanding (NLU) tasks in Bulgarian.
We run the first systematic evaluation of pre-trained language models for Bulgarian, comparing and contrasting results across the nine tasks in the benchmark.
arXiv Detail & Related papers (2023-06-04T12:54:00Z) - XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems
to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders.
Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z) - Multitask Prompted Training Enables Zero-Shot Task Generalization [70.12770442071657]
We develop a system for mapping general natural language tasks into a human-readable prompted form.
We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.
The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size.
arXiv Detail & Related papers (2021-10-15T17:08:57Z) - e-ViL: A Dataset and Benchmark for Natural Language Explanations in
Vision-Language Tasks [52.918087305406296]
We introduce e-ViL, a benchmark for evaluate explainable vision-language tasks.
We also introduce e-SNLI-VE, the largest existing dataset with NLEs.
We propose a new model that combines UNITER, which learns joint embeddings of images and text, and GPT-2, a pre-trained language model.
arXiv Detail & Related papers (2021-05-08T18:46:33Z) - Schema-Guided Natural Language Generation [13.11874946084068]
We present the novel task ofGuided Natural Language Generation (SG-NLG)
In SG-NLG, the goal is still to generate a natural language prompt, but in SG-NLG, the input MRs are paired with rich schemata providing contextual information.
We train different state-of-the-art models for neural natural language generation on this dataset and show that in many cases, including rich schema information allows our models to produce higher quality outputs.
arXiv Detail & Related papers (2020-05-11T23:01:22Z) - KLEJ: Comprehensive Benchmark for Polish Language Understanding [4.702729080310267]
We introduce a comprehensive multi-task benchmark for the Polish language understanding, accompanied by an online leaderboard.
We also release HerBERT, a Transformer-based model trained specifically for the Polish language, which has the best average performance and obtains the best results for three out of nine tasks.
arXiv Detail & Related papers (2020-05-01T21:55:40Z) - XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training,
Understanding and Generation [100.09099800591822]
XGLUE is a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models.
XGLUE provides 11 diversified tasks that cover both natural language understanding and generation scenarios.
arXiv Detail & Related papers (2020-04-03T07:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.