CPM: A Large-scale Generative Chinese Pre-trained Language Model
- URL: http://arxiv.org/abs/2012.00413v1
- Date: Tue, 1 Dec 2020 11:32:56 GMT
- Title: CPM: A Large-scale Generative Chinese Pre-trained Language Model
- Authors: Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia
Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng,
Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu,
Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun
- Abstract summary: We release the Chinese Pre-trained Language Model (CPM) with generative pre-training on large-scale Chinese training data.
CPM achieves strong performance on many NLP tasks in the settings of few-shot (even zero-shot) learning.
- Score: 76.65305358932393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained Language Models (PLMs) have proven to be beneficial for various
downstream NLP tasks. Recently, GPT-3, with 175 billion parameters and 570GB
training data, drew a lot of attention due to the capacity of few-shot (even
zero-shot) learning. However, applying GPT-3 to address Chinese NLP tasks is
still challenging, as the training corpus of GPT-3 is primarily English, and
the parameters are not publicly available. In this technical report, we release
the Chinese Pre-trained Language Model (CPM) with generative pre-training on
large-scale Chinese training data. To the best of our knowledge, CPM, with 2.6
billion parameters and 100GB Chinese training data, is the largest Chinese
pre-trained language model, which could facilitate several downstream Chinese
NLP tasks, such as conversation, essay generation, cloze test, and language
understanding. Extensive experiments demonstrate that CPM achieves strong
performance on many NLP tasks in the settings of few-shot (even zero-shot)
learning. The code and parameters are available at
https://github.com/TsinghuaAI/CPM-Generate.
Related papers
- InstructionCP: A fast approach to transfer Large Language Models into target language [55.2480439325792]
InsCP integrates instruction tags into the CP process to prevent loss of conversational proficiency while acquiring new languages.
Our experiments demonstrate that InsCP retains conversational and Reinforcement Learning from Human Feedback abilities.
This approach requires only 0.1 billion tokens of high-quality instruction-following data, thereby reducing resource consumption.
arXiv Detail & Related papers (2024-05-30T15:45:13Z) - Revisiting and Advancing Chinese Natural Language Understanding with
Accelerated Heterogeneous Knowledge Pre-training [25.510288465345592]
Unlike English, there is a lack of high-performing open-source Chinese KEPLMs in the natural language processing (NLP) community to support various language understanding applications.
Here, we revisit and advance the development of Chinese natural language understanding with a series of novel Chinese KEPLMs released in various parameter sizes.
Specifically, both relational and linguistic knowledge is effectively injected into CKBERT based on two novel pre-training tasks.
arXiv Detail & Related papers (2022-10-11T09:34:21Z) - AdaPrompt: Adaptive Model Training for Prompt-based NLP [77.12071707955889]
We propose AdaPrompt, adaptively retrieving external data for continual pretraining of PLMs.
Experimental results on five NLP benchmarks show that AdaPrompt can improve over standard PLMs in few-shot settings.
In zero-shot settings, our method outperforms standard prompt-based methods by up to 26.35% relative error reduction.
arXiv Detail & Related papers (2022-02-10T04:04:57Z) - Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and
Few-Shot Learning [18.932100477957462]
Recent work like GPT-3 has demonstrated excellent performance of Zero-Shot and Few-Shot learning on many natural language processing (NLP) tasks.
We propose a method that incorporates large-scale distributed training performance into model architecture design.
arXiv Detail & Related papers (2021-10-10T07:40:22Z) - FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark [8.158067688043554]
This work first introduces Chinese Few-shot Learning Evaluation Benchmark (FewCLUE), the first comprehensive small sample evaluation benchmark in Chinese.
An unlabeled training set with up to 20,000 additional samples per task is provided, allowing researchers to explore better ways of using unlabeled samples.
Next, we implement a set of state-of-the-art few-shot learning methods, and compare their performance with fine-tuning and zero-shot learning schemes on the newly constructed FewCLUE benchmark.
arXiv Detail & Related papers (2021-07-15T17:51:25Z) - ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language
Understanding and Generation [25.430130072811075]
We propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models.
It fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks.
We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph.
arXiv Detail & Related papers (2021-07-05T16:54:59Z) - PanGu-$\alpha$: Large-scale Autoregressive Pretrained Chinese Language
Models with Auto-parallel Computation [58.31465205357637]
We present our practice on training large-scale autoregressive language models named PanGu-$alpha$, with up to 200 billion parameters.
PanGu-$alpha$ is developed under the MindSpore and trained on a cluster of 2048 Ascend 910 AI processors.
arXiv Detail & Related papers (2021-04-26T06:59:36Z) - AmericasNLI: Evaluating Zero-shot Natural Language Understanding of
Pretrained Multilingual Models in Truly Low-resource Languages [75.08199398141744]
We present AmericasNLI, an extension of XNLI (Conneau et al.), to 10 indigenous languages of the Americas.
We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches.
We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.62%.
arXiv Detail & Related papers (2021-04-18T05:32:28Z) - Language Models are Few-Shot Learners [61.36677350504291]
We show that scaling up language models greatly improves task-agnostic, few-shot performance.
We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
arXiv Detail & Related papers (2020-05-28T17:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.