Intrinsic Knowledge Evaluation on Chinese Language Models
- URL: http://arxiv.org/abs/2011.14277v1
- Date: Sun, 29 Nov 2020 04:34:39 GMT
- Title: Intrinsic Knowledge Evaluation on Chinese Language Models
- Authors: Zhiruo Wang, Renfen Hu
- Abstract summary: This paper proposes four tasks on syntactic, semantic, commonsense, and factual knowledge, aggregating to a total of $39,308$ questions.
Our probes and knowledge data prove to be a reliable benchmark for evaluating pre-trained Chinese LMs.
- Score: 5.293979881130493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent NLP tasks have benefited a lot from pre-trained language models (LM)
since they are able to encode knowledge of various aspects. However, current LM
evaluations focus on downstream performance, hence lack to comprehensively
inspect in which aspect and to what extent have they encoded knowledge. This
paper addresses both queries by proposing four tasks on syntactic, semantic,
commonsense, and factual knowledge, aggregating to a total of $39,308$
questions covering both linguistic and world knowledge in Chinese. Throughout
experiments, our probes and knowledge data prove to be a reliable benchmark for
evaluating pre-trained Chinese LMs. Our work is publicly available at
https://github.com/ZhiruoWang/ChnEval.
Related papers
- Benchmarking Chinese Knowledge Rectification in Large Language Models [43.9841600678381]
This paper introduces a benchmark for rectifying Chinese knowledge in Large Language Models via knowledge editing.
We collect seven type of knowledge from various sources, including classical texts, idioms, and content from Baidu Tieba Ruozhiba.
Through the analysis of this dataset, we uncover the challenges faced by current LLMs in mastering Chinese.
arXiv Detail & Related papers (2024-09-09T17:11:51Z) - LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models [46.77647640464652]
Chinese Large Language Models (LLMs) have recently demonstrated impressive capabilities across various NLP benchmarks and real-world applications.
We propose LHMKE, a Large-scale, Holistic, and Multi-subject Knowledge Evaluation benchmark.
It encompasses 10,465 questions across 75 tasks covering 30 subjects, ranging from primary school to professional certification exams.
arXiv Detail & Related papers (2024-03-19T10:11:14Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models [53.9835961434552]
We introduce the Chinese Instruction-Following Benchmark (CIF-Bench) to evaluate the generalizability of large language models (LLMs) to the Chinese language.
CIF-Bench comprises 150 tasks and 15,000 input-output pairs, developed by native speakers to test complex reasoning and Chinese cultural nuances.
To mitigate data contamination, we release only half of the dataset publicly, with the remainder kept private, and introduce diversified instructions to minimize score variance.
arXiv Detail & Related papers (2024-02-20T16:02:12Z) - Cross-Lingual Knowledge Editing in Large Language Models [73.12622532088564]
Knowledge editing has been shown to adapt large language models to new knowledge without retraining from scratch.
It is still unknown the effect of source language editing on a different target language.
We first collect a large-scale cross-lingual synthetic dataset by translating ZsRE from English to Chinese.
arXiv Detail & Related papers (2023-09-16T11:07:52Z) - Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus.
We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z) - A Survey of Knowledge Enhanced Pre-trained Language Models [78.56931125512295]
We present a comprehensive review of Knowledge Enhanced Pre-trained Language Models (KE-PLMs)
For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG) and rule knowledge.
The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods.
arXiv Detail & Related papers (2022-11-11T04:29:02Z) - X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained
Language Models [103.75890012041366]
Language models (LMs) have proven surprisingly successful at capturing factual knowledge.
However, studies on LMs' factual representation ability have almost invariably been performed on English.
We create a benchmark of cloze-style probes for 23 typologically diverse languages.
arXiv Detail & Related papers (2020-10-13T05:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.