KLUE: Korean Language Understanding Evaluation
- URL: http://arxiv.org/abs/2105.09680v2
- Date: Fri, 21 May 2021 05:54:22 GMT
- Title: KLUE: Korean Language Understanding Evaluation
- Authors: Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Jiyoon Han,
Jangwon Park, Chisung Song, Junseong Kim, Yongsook Song, Taehwan Oh, Joohong
Lee, Juhyun Oh, Sungwon Lyu, Younghoon Jeong, Inkwon Lee, Sangwoo Seo,
Dongjun Lee, Hyunwoo Kim, Myeonghwa Lee, Seongbo Jang, Seungwon Do, Sunkyoung
Kim, Kyungtae Lim, Jongwon Lee, Kyumin Park, Jamin Shin, Seonghyun Kim, Lucy
Park, Alice Oh, Jung-Woo Ha, Kyunghyun Cho
- Abstract summary: We introduce Korean Language Understanding Evaluation (KLUE) benchmark.
KLUE is a collection of 8 Korean natural language understanding (NLU) tasks.
We build all of the tasks from scratch from diverse source corpora while respecting copyrights.
- Score: 43.94952771238633
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE
is a collection of 8 Korean natural language understanding (NLU) tasks,
including Topic Classification, SemanticTextual Similarity, Natural Language
Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing,
Machine Reading Comprehension, and Dialogue State Tracking. We build all of the
tasks from scratch from diverse source corpora while respecting copyrights, to
ensure accessibility for anyone without any restrictions. With ethical
considerations in mind, we carefully design annotation protocols. Along with
the benchmark tasks and data, we provide suitable evaluation metrics and
fine-tuning recipes for pretrained language models for each task. We
furthermore release the pretrained language models (PLM), KLUE-BERT and
KLUE-RoBERTa, to help reproducing baseline models on KLUE and thereby
facilitate future research. We make a few interesting observations from the
preliminary experiments using the proposed KLUE benchmark suite, already
demonstrating the usefulness of this new benchmark suite. First, we find
KLUE-RoBERTa-large outperforms other baselines, including multilingual PLMs and
existing open-source Korean PLMs. Second, we see minimal degradation in
performance even when we replace personally identifiable information from the
pretraining corpus, suggesting that privacy and NLU capability are not at odds
with each other. Lastly, we find that using BPE tokenization in combination
with morpheme-level pre-tokenization is effective in tasks involving
morpheme-level tagging, detection and generation. In addition to accelerating
Korean NLP research, our comprehensive documentation on creating KLUE will
facilitate creating similar resources for other languages in the future. KLUE
is available at <a class="link-external link-https"
href="https://klue-benchmark.com/">this URL</a>.
Related papers
- AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark [3.1927733045184885]
AAVENUE is a benchmark for evaluating large language model (LLM) performance on NLU tasks in AAVE and Standard American English.
We compare AAVENUE and VALUE translations using five popular LLMs and a comprehensive set of metrics including fluency, BARTScore, quality, coherence, and understandability.
Our evaluations reveal that LLMs consistently perform better on SAE tasks than AAVE-translated versions, underscoring inherent biases.
arXiv Detail & Related papers (2024-08-27T07:56:35Z) - A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - NLPre: a revised approach towards language-centric benchmarking of Natural Language Preprocessing systems [2.141587359797428]
It is arduous to compare novel solutions to well-entrenched preprocessing toolkits, relying on rule-based morphological analysers or dictionaries.
Inspired by the GLUE benchmark, the proposed language-centric benchmarking system enables comprehensive ongoing evaluation of multiple NLPre tools.
The prototype application is configured for Polish and integrated with the thoroughly assembled NLPre-PL benchmark.
arXiv Detail & Related papers (2024-03-07T14:07:00Z) - CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large
Language Models for Data Annotation [94.59630161324013]
We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale.
Our empirical study shows CoAnnotating to be an effective means to allocate work from results on different datasets, with up to 21% performance improvement over random baseline.
arXiv Detail & Related papers (2023-10-24T08:56:49Z) - This is the way: designing and compiling LEPISZCZE, a comprehensive NLP
benchmark for Polish [5.8090623549313944]
We introduce LEPISZCZE, a new, comprehensive benchmark for Polish NLP.
We use five datasets from the Polish benchmark and add eight novel datasets.
We provide insights and experiences learned while creating the benchmark for Polish as the blueprint to design similar benchmarks for other low-resourced languages.
arXiv Detail & Related papers (2022-11-23T16:51:09Z) - KOBEST: Korean Balanced Evaluation of Significant Tasks [3.664687661363732]
A well-formulated benchmark plays a critical role in spurring advancements in the natural language processing (NLP) field.
We propose a new benchmark named Korean balanced evaluation of significant tasks (KoBEST), which consists of five Korean-language downstream tasks.
arXiv Detail & Related papers (2022-04-09T20:13:51Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language
Understanding [4.576330530169462]
Natural language inference (NLI) and semantic textual similarity (STS) are key tasks in natural language understanding (NLU)
There are no publicly available NLI or STS datasets in the Korean language.
We construct and release new datasets for Korean NLI and STS, dubbed KorNLI and KorSTS, respectively.
arXiv Detail & Related papers (2020-04-07T11:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.