Domain Mastery Benchmark: An Ever-Updating Benchmark for Evaluating
Holistic Domain Knowledge of Large Language Model--A Preliminary Release
- URL: http://arxiv.org/abs/2304.11679v2
- Date: Thu, 10 Aug 2023 05:27:58 GMT
- Title: Domain Mastery Benchmark: An Ever-Updating Benchmark for Evaluating
Holistic Domain Knowledge of Large Language Model--A Preliminary Release
- Authors: Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye, Lin Zhang, Zhuozhi Xiong, Zihan
Li, Qianyu He, Sihang Jiang, Hongwei Feng, Yanghua Xiao
- Abstract summary: DomMa targets at testing Large Language Models (LLMs) on their domain knowledge understanding.
It features extensive domain coverage, large data volume, and a continually updated data set based on Chinese 112 first-level subject classifications.
- Score: 13.603414598813938
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Domain knowledge refers to the in-depth understanding, expertise, and
familiarity with a specific subject, industry, field, or area of special
interest. The existing benchmarks are all lack of an overall design for domain
knowledge evaluation. Holding the belief that the real ability of domain
language understanding can only be fairly evaluated by an comprehensive and
in-depth benchmark, we introduces the Domma, a Domain Mastery Benchmark. DomMa
targets at testing Large Language Models (LLMs) on their domain knowledge
understanding, it features extensive domain coverage, large data volume, and a
continually updated data set based on Chinese 112 first-level subject
classifications. DomMa consist of 100,000 questions in both Chinese and English
sourced from graduate entrance examinations and undergraduate exams in Chinese
college. We have also propose designs to make benchmark and evaluation process
more suitable to LLMs.
Related papers
- LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models [46.77647640464652]
Chinese Large Language Models (LLMs) have recently demonstrated impressive capabilities across various NLP benchmarks and real-world applications.
We propose LHMKE, a Large-scale, Holistic, and Multi-subject Knowledge Evaluation benchmark.
It encompasses 10,465 questions across 75 tasks covering 30 subjects, ranging from primary school to professional certification exams.
arXiv Detail & Related papers (2024-03-19T10:11:14Z) - ArcMMLU: A Library and Information Science Benchmark for Large Language
Models [25.36473762494066]
This paper introduces ArcMMLU, a benchmark tailored for the Library & Information Science (LIS) domain in Chinese.
This benchmark aims to measure the knowledge and reasoning capability of LLMs within four key sub-domains: Archival Science, Data Science, Library Science, and Information Science.
Our comprehensive evaluation reveals that while most mainstream LLMs achieve an average accuracy rate above 50% on ArcMMLU, there remains a notable performance gap.
arXiv Detail & Related papers (2023-11-30T16:08:04Z) - Systematic Assessment of Factual Knowledge in Large Language Models [48.75961313441549]
This paper proposes a framework to assess the factual knowledge of large language models (LLMs) by leveraging knowledge graphs (KGs)
Our framework automatically generates a set of questions and expected answers from the facts stored in a given KG, and then evaluates the accuracy of LLMs in answering these questions.
arXiv Detail & Related papers (2023-10-18T00:20:50Z) - NuclearQA: A Human-Made Benchmark for Language Models for the Nuclear
Domain [0.0]
NuclearQA is a human-made benchmark of 100 questions to evaluate language models in the nuclear domain.
We show how the mix of several types of questions makes our benchmark uniquely capable of evaluating models in the nuclear domain.
arXiv Detail & Related papers (2023-10-17T01:27:20Z) - Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge
Evaluation [61.56563631219381]
We present Xiezhi, the most comprehensive evaluation suite designed to assess holistic domain knowledge.
Xiezhi comprises multiple-choice questions across 516 diverse disciplines ranging from 13 different subjects with 249,587 questions and accompanied by Xiezhi- Specialty and Xiezhi-Interdiscipline, both with 15k questions.
arXiv Detail & Related papers (2023-06-09T09:52:05Z) - Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey [100.24095818099522]
Large language models (LLMs) have significantly advanced the field of natural language processing (NLP)
They provide a highly useful, task-agnostic foundation for a wide range of applications.
However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles.
arXiv Detail & Related papers (2023-05-30T03:00:30Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment [58.46761798403072]
A model-based automatic dialogue evaluation metric (ADEM) is expected to perform well across multiple domains.
Despite significant progress, an ADEM that works well in one domain does not necessarily generalize to another.
We propose a Panel of Experts (PoE) network that consists of a shared transformer encoder and a collection of lightweight adapters.
arXiv Detail & Related papers (2022-12-18T02:26:50Z) - DomBERT: Domain-oriented Language Model for Aspect-based Sentiment
Analysis [71.40586258509394]
We propose DomBERT, an extension of BERT to learn from both in-domain corpus and relevant domain corpora.
Experiments are conducted on an assortment of tasks in aspect-based sentiment analysis, demonstrating promising results.
arXiv Detail & Related papers (2020-04-28T21:07:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.