TeleQnA: A Benchmark Dataset to Assess Large Language Models
Telecommunications Knowledge
- URL: http://arxiv.org/abs/2310.15051v1
- Date: Mon, 23 Oct 2023 15:55:15 GMT
- Title: TeleQnA: A Benchmark Dataset to Assess Large Language Models
Telecommunications Knowledge
- Authors: Ali Maatouk, Fadhel Ayed, Nicola Piovesan, Antonio De Domenico,
Merouane Debbah, Zhi-Quan Luo
- Abstract summary: TeleQnA is the first benchmark dataset designed to evaluate the knowledge of Large Language Models (LLMs) in telecommunications.
This paper outlines the automated question generation framework responsible for creating this dataset, along with how human input was integrated at various stages to ensure the quality of the questions.
The dataset has been made publicly accessible on GitHub.
- Score: 26.302396162473293
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce TeleQnA, the first benchmark dataset designed to evaluate the
knowledge of Large Language Models (LLMs) in telecommunications. Comprising
10,000 questions and answers, this dataset draws from diverse sources,
including standards and research articles. This paper outlines the automated
question generation framework responsible for creating this dataset, along with
how human input was integrated at various stages to ensure the quality of the
questions. Afterwards, using the provided dataset, an evaluation is conducted
to assess the capabilities of LLMs, including GPT-3.5 and GPT-4. The results
highlight that these models struggle with complex standards related questions
but exhibit proficiency in addressing general telecom-related inquiries.
Additionally, our results showcase how incorporating telecom knowledge context
significantly enhances their performance, thus shedding light on the need for a
specialized telecom foundation model. Finally, the dataset is shared with
active telecom professionals, whose performance is subsequently benchmarked
against that of the LLMs. The findings illustrate that LLMs can rival the
performance of active professionals in telecom knowledge, thanks to their
capacity to process vast amounts of information, underscoring the potential of
LLMs within this domain. The dataset has been made publicly accessible on
GitHub.
Related papers
- Federated Large Language Models: Current Progress and Future Directions [63.68614548512534]
This paper surveys Federated learning for LLMs (FedLLM), highlighting recent advances and future directions.
We focus on two key aspects: fine-tuning and prompt learning in a federated setting, discussing existing work and associated research challenges.
arXiv Detail & Related papers (2024-09-24T04:14:33Z) - TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models [7.015008083968722]
Large Language Models (LLMs) have the potential to revolutionize the Sixth Generation (6G) communication networks.
This paper proposes a pipeline to adapt any general purpose LLMs to a telecom-specific LLMs.
We extend existing evaluation benchmarks and proposed three new benchmarks, namely, Telecom Math Modeling, Telecom Open QnA and Telecom Code Tasks.
arXiv Detail & Related papers (2024-07-12T16:51:02Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - DCA-Bench: A Benchmark for Dataset Curation Agents [9.60250892491588]
We propose a dataset curation agent benchmark, DCA-Bench, to measure large language models' capability of detecting hidden dataset quality issues.
Specifically, we collect diverse real-world dataset quality issues from eight open dataset platforms as a testbed.
The proposed benchmark can also serve as a testbed for measuring the capability of LLMs in problem discovery rather than just problem-solving.
arXiv Detail & Related papers (2024-06-11T14:02:23Z) - Using Large Language Models to Understand Telecom Standards [35.343893798039765]
Large Language Models (LLMs) may provide faster access to relevant information.
We evaluate the capability of state-of-art LLMs to be used as Question Answering (QA) assistants.
Results show that LLMs can be used as a credible reference tool on telecom technical documents.
arXiv Detail & Related papers (2024-04-02T09:54:51Z) - Automatic Question-Answer Generation for Long-Tail Knowledge [65.11554185687258]
We propose an automatic approach to generate specialized QA datasets for tail entities.
We conduct extensive experiments by employing pretrained LLMs on our newly generated long-tail QA datasets.
arXiv Detail & Related papers (2024-03-03T03:06:31Z) - Datasets for Large Language Models: A Comprehensive Survey [37.153302283062004]
The survey consolidates and categorizes the fundamental aspects of LLM datasets from five perspectives.
The survey sheds light on the prevailing challenges and points out potential avenues for future investigation.
The total data size surveyed surpasses 774.5 TB for pre-training corpora and 700M instances for other datasets.
arXiv Detail & Related papers (2024-02-28T04:35:51Z) - Vision-Language Instruction Tuning: A Review and Analysis [52.218690619616474]
Vision-Language Instruction Tuning (VLIT) presents more complex characteristics compared to pure text instruction tuning.
We offer a detailed categorization for existing VLIT datasets and identify the characteristics that high-quality VLIT data should possess.
By incorporating these characteristics as guiding principles into the existing VLIT data construction process, we conduct extensive experiments and verify their positive impact on the performance of tuned multi-modal LLMs.
arXiv Detail & Related papers (2023-11-14T14:02:32Z) - Automated Claim Matching with Large Language Models: Empowering
Fact-Checkers in the Fight Against Misinformation [11.323961700172175]
FACT-GPT is a framework designed to automate the claim matching phase of fact-checking using Large Language Models.
This framework identifies new social media content that either supports or contradicts claims previously debunked by fact-checkers.
We evaluated FACT-GPT on an extensive dataset of social media content related to public health.
arXiv Detail & Related papers (2023-10-13T16:21:07Z) - LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities [66.36633042421387]
Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning evaluated.
We propose AutoKG, a multi-agent-based approach employing LLMs and external sources for KG construction and reasoning.
arXiv Detail & Related papers (2023-05-22T15:56:44Z) - DataPerf: Benchmarks for Data-Centric AI Development [81.03754002516862]
DataPerf is a community-led benchmark suite for evaluating ML datasets and data-centric algorithms.
We provide an open, online platform with multiple rounds of challenges to support this iterative development.
The benchmarks, online evaluation platform, and baseline implementations are open source.
arXiv Detail & Related papers (2022-07-20T17:47:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.