ElitePLM: An Empirical Study on General Language Ability Evaluation of
Pretrained Language Models
- URL: http://arxiv.org/abs/2205.01523v1
- Date: Tue, 3 May 2022 14:18:10 GMT
- Title: ElitePLM: An Empirical Study on General Language Ability Evaluation of
Pretrained Language Models
- Authors: Junyi Li, Tianyi Tang, Zheng Gong, Lixin Yang, Zhuohao Yu, Zhipeng
Chen, Jingyuan Wang, Wayne Xin Zhao and Ji-Rong Wen
- Abstract summary: We present a large-scale empirical study on general language ability evaluation of pretrained language models (ElitePLM)
Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; and (3) PLMs have excellent transferability between similar tasks.
- Score: 78.08792285698853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, pretrained language models (PLMs) have dominated the majority of
NLP tasks. While, little research has been conducted on systematically
evaluating the language abilities of PLMs. In this paper, we present a
large-scale empirical study on general language ability evaluation of PLMs
(ElitePLM). In our study, we design four evaluation dimensions, i.e. memory,
comprehension, reasoning, and composition, to measure ten widely-used PLMs
within five categories. Our empirical results demonstrate that: (1) PLMs with
varying training objectives and strategies are good at different ability tests;
(2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size
and distribution; (3) PLMs have excellent transferability between similar
tasks. Moreover, the prediction results of PLMs in our experiments are released
as an open resource for more deep and detailed analysis on the language
abilities of PLMs. This paper can guide the future work to select, apply, and
design PLMs for specific tasks. We have made all the details of experiments
publicly available at https://github.com/RUCAIBox/ElitePLM.
Related papers
- MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs [97.94579295913606]
Multimodal Large Language Models (MLLMs) have garnered increased attention from both industry and academia.
In the development process, evaluation is critical since it provides intuitive feedback and guidance on improving models.
This work aims to offer researchers an easy grasp of how to effectively evaluate MLLMs according to different needs and to inspire better evaluation methods.
arXiv Detail & Related papers (2024-11-22T18:59:54Z) - Few-Shot Cross-Lingual Transfer for Prompting Large Language Models in
Low-Resource Languages [0.0]
"prompting" is where a user provides a description of a task and some completed examples of the task to a PLM as context before prompting the PLM to perform the task on a new example.
We consider three methods: few-shot prompting (prompt), language-adaptive fine-tuning (LAFT), and neural machine translation (translate)
We find that translate and prompt settings are a compute-efficient and cost-effective method of few-shot prompting for the selected low-resource languages.
arXiv Detail & Related papers (2024-03-09T21:36:13Z) - Linguistic Intelligence in Large Language Models for Telecommunications [5.06945923921948]
Large Language Models (LLMs) have emerged as a significant advancement in the field of Natural Language Processing (NLP)
This study seeks to evaluate the knowledge and understanding capabilities of LLMs within the telecommunications domain.
Our evaluation reveals that zero-shot LLMs can achieve performance levels comparable to the current state-of-the-art fine-tuned models.
arXiv Detail & Related papers (2024-02-24T14:01:07Z) - PRE: A Peer Review Based Large Language Model Evaluator [14.585292530642603]
Existing paradigms rely on either human annotators or model-based evaluators to evaluate the performance of LLMs.
We propose a novel framework that can automatically evaluate LLMs through a peer-review process.
arXiv Detail & Related papers (2024-01-28T12:33:14Z) - Through the Lens of Core Competency: Survey on Evaluation of Large
Language Models [27.271533306818732]
Large language model (LLM) has excellent performance and wide practical uses.
Existing evaluation tasks are difficult to keep up with the wide range of applications in real-world scenarios.
We summarize 4 core competencies of LLM, including reasoning, knowledge, reliability, and safety.
Under this competency architecture, similar tasks are combined to reflect corresponding ability, while new tasks can also be easily added into the system.
arXiv Detail & Related papers (2023-08-15T17:40:34Z) - Metacognitive Prompting Improves Understanding in Large Language Models [12.112914393948415]
We introduce Metacognitive Prompting (MP), a strategy inspired by human introspective reasoning processes.
We conduct experiments on four prevalent Large Language Models (LLMs) across ten natural language understanding (NLU) datasets.
MP consistently outperforms existing prompting methods in both general and domain-specific NLU tasks.
arXiv Detail & Related papers (2023-08-10T05:10:17Z) - A Survey on Evaluation of Large Language Models [87.60417393701331]
Large language models (LLMs) are gaining increasing popularity in both academia and industry.
This paper focuses on three key dimensions: what to evaluate, where to evaluate, and how to evaluate.
arXiv Detail & Related papers (2023-07-06T16:28:35Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z) - LERT: A Linguistically-motivated Pre-trained Language Model [67.65651497173998]
We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original pre-training task.
We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements.
arXiv Detail & Related papers (2022-11-10T05:09:16Z) - Knowledge Inheritance for Pre-trained Language Models [57.51305807391381]
We introduce a novel pre-training framework named "knowledge inheritance" (KI)
KI combines both self-learning and teacher-guided learning to efficiently train larger PLMs.
We show that KI can well support lifelong learning and knowledge transfer.
arXiv Detail & Related papers (2021-05-28T14:43:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.