Domain Mastery Benchmark: An Ever-Updating Benchmark for Evaluating
Holistic Domain Knowledge of Large Language Model--A Preliminary Release
- URL: http://arxiv.org/abs/2304.11679v2
- Date: Thu, 10 Aug 2023 05:27:58 GMT
- Title: Domain Mastery Benchmark: An Ever-Updating Benchmark for Evaluating
Holistic Domain Knowledge of Large Language Model--A Preliminary Release
- Authors: Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye, Lin Zhang, Zhuozhi Xiong, Zihan
Li, Qianyu He, Sihang Jiang, Hongwei Feng, Yanghua Xiao
- Abstract summary: DomMa targets at testing Large Language Models (LLMs) on their domain knowledge understanding.
It features extensive domain coverage, large data volume, and a continually updated data set based on Chinese 112 first-level subject classifications.
- Score: 13.603414598813938
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Domain knowledge refers to the in-depth understanding, expertise, and
familiarity with a specific subject, industry, field, or area of special
interest. The existing benchmarks are all lack of an overall design for domain
knowledge evaluation. Holding the belief that the real ability of domain
language understanding can only be fairly evaluated by an comprehensive and
in-depth benchmark, we introduces the Domma, a Domain Mastery Benchmark. DomMa
targets at testing Large Language Models (LLMs) on their domain knowledge
understanding, it features extensive domain coverage, large data volume, and a
continually updated data set based on Chinese 112 first-level subject
classifications. DomMa consist of 100,000 questions in both Chinese and English
sourced from graduate entrance examinations and undergraduate exams in Chinese
college. We have also propose designs to make benchmark and evaluation process
more suitable to LLMs.
Related papers
- Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning [55.107329995417786]
Large language models (LLMs) have demonstrated impressive general understanding and generation abilities.
We establish a benchmark for multi-domain translation, featuring 25 German$Leftrightarrow$English and 22 Chinese$Leftrightarrow$English test sets.
We propose a domain Chain of Thought (CoT) fine-tuning technique that utilizes the intrinsic multi-domain intelligence of LLMs to improve translation performance.
arXiv Detail & Related papers (2024-10-03T16:15:04Z) - Pretraining and Updates of Domain-Specific LLM: A Case Study in the Japanese Business Domain [4.133477882188227]
This paper presents our findings from training and evaluating a Japanese business domain-specific LLM.
Our pretrained model and business domain benchmark are publicly available to support further studies.
arXiv Detail & Related papers (2024-04-12T06:21:48Z) - Systematic Assessment of Factual Knowledge in Large Language Models [48.75961313441549]
This paper proposes a framework to assess the factual knowledge of large language models (LLMs) by leveraging knowledge graphs (KGs)
Our framework automatically generates a set of questions and expected answers from the facts stored in a given KG, and then evaluates the accuracy of LLMs in answering these questions.
arXiv Detail & Related papers (2023-10-18T00:20:50Z) - NuclearQA: A Human-Made Benchmark for Language Models for the Nuclear
Domain [0.0]
NuclearQA is a human-made benchmark of 100 questions to evaluate language models in the nuclear domain.
We show how the mix of several types of questions makes our benchmark uniquely capable of evaluating models in the nuclear domain.
arXiv Detail & Related papers (2023-10-17T01:27:20Z) - Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge
Evaluation [61.56563631219381]
We present Xiezhi, the most comprehensive evaluation suite designed to assess holistic domain knowledge.
Xiezhi comprises multiple-choice questions across 516 diverse disciplines ranging from 13 different subjects with 249,587 questions and accompanied by Xiezhi- Specialty and Xiezhi-Interdiscipline, both with 15k questions.
arXiv Detail & Related papers (2023-06-09T09:52:05Z) - Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey [100.24095818099522]
Large language models (LLMs) have significantly advanced the field of natural language processing (NLP)
They provide a highly useful, task-agnostic foundation for a wide range of applications.
However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles.
arXiv Detail & Related papers (2023-05-30T03:00:30Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - DomBERT: Domain-oriented Language Model for Aspect-based Sentiment
Analysis [71.40586258509394]
We propose DomBERT, an extension of BERT to learn from both in-domain corpus and relevant domain corpora.
Experiments are conducted on an assortment of tasks in aspect-based sentiment analysis, demonstrating promising results.
arXiv Detail & Related papers (2020-04-28T21:07:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.