Related papers: Telecom Language Models: Must They Be Large?

Telecom Language Models: Must They Be Large?

URL: http://arxiv.org/abs/2403.04666v2
Date: Tue, 25 Jun 2024 09:28:43 GMT
Title: Telecom Language Models: Must They Be Large?
Authors: Nicola Piovesan, Antonio De Domenico, Fadhel Ayed,
Abstract summary: Small language models that exhibit performance comparable to their larger counterparts in many tasks. Phi-2 is a compact yet powerful model that exemplifies this new wave of efficient small language models. This paper conducts a comprehensive evaluation of Phi-2's intrinsic understanding of the telecommunications domain.
Score: 7.82773820037707
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing interest in Large Language Models (LLMs) within the telecommunications sector underscores their potential to revolutionize operational efficiency. However, the deployment of these sophisticated models is often hampered by their substantial size and computational demands, raising concerns about their viability in resource-constrained environments. Addressing this challenge, recent advancements have seen the emergence of small language models that surprisingly exhibit performance comparable to their larger counterparts in many tasks, such as coding and common-sense reasoning. Phi-2, a compact yet powerful model, exemplifies this new wave of efficient small language models. This paper conducts a comprehensive evaluation of Phi-2's intrinsic understanding of the telecommunications domain. Recognizing the scale-related limitations, we enhance Phi-2's capabilities through a Retrieval-Augmented Generation approach, meticulously integrating an extensive knowledge base specifically curated with telecom standard specifications. The enhanced Phi-2 model demonstrates a profound improvement in accuracy, answering questions about telecom standards with a precision that closely rivals the more resource-intensive GPT-3.5. The paper further explores the refined capabilities of Phi-2 in addressing problem-solving scenarios within the telecom sector, highlighting its potential and limitations.

Related papers

Intelligent Mobile AI-Generated Content Services via Interactive Prompt Engineering and Dynamic Service Provisioning [55.641299901038316]
AI-generated content can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and customized content for resource-constrained users. Such a paradigm faces two significant challenges: 1) raw prompts often lead to poor generation quality due to users' lack of experience with specific AIGC models, and 2) static service provisioning fails to efficiently utilize computational and communication resources. We develop an interactive prompt engineering mechanism that leverages a Large Language Model (LLM) to generate customized prompt corpora and employs Inverse Reinforcement Learning (IRL) for policy imitation.
arXiv Detail & Related papers (2025-02-17T03:05:20Z)
TeleOracle: Fine-Tuned Retrieval-Augmented Generation with Long-Context Support for Network [4.551436852242372]
We present TeleOracle, a telecom-specialized retrieval-augmented generation (RAG) system built on the Phi-2 small language model (SLM) To improve context retrieval, TeleOracle employs a two-stage retriever that incorporates semantic chunking and hybrid keyword and semantic search. A thorough analysis of the model's performance indicates that our RAG framework is effective in aligning Phi-2 to the telecom domain in a downstream question and answer (QnA) task, achieving a 30% improvement in accuracy over the base Phi-2 model.
arXiv Detail & Related papers (2024-11-04T21:12:08Z)
Rephrase and Contrast: Fine-Tuning Language Models for Enhanced Understanding of Communication and Computer Networks [13.829525575305206]
This paper introduces our Rephrase and Contrast (RaC) framework, an efficient fine-tuning framework. RaC enhances LLMs' comprehension and critical thinking abilities by incorporating question reformulation and contrastive analysis. To efficiently construct the dataset for RaC fine-tuning, we develop a GPT-assisted data mining method for generating high-quality question-answer pairs.
arXiv Detail & Related papers (2024-09-21T16:04:43Z)
QMOS: Enhancing LLMs for Telecommunication with Question Masked loss and Option Shuffling [10.42541749928513]
GPT-3.5 has been used in recent work, to obtain noteworthy accuracy for telecom-related questions in a Retrieval Augmented Generation framework. This paper introduces QMOS, an innovative approach which uses a Question-Masked loss and Option Shuffling trick to enhance the performance of LLMs in answering Multiple-Choice Questions in the telecommunications domain.
arXiv Detail & Related papers (2024-09-21T15:32:10Z)
Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards [4.334100270812517]
Large language models (LLMs) struggle with technical standards in telecommunications. We propose a fine-tuned retrieval-augmented generation (RAG) system based on the Phi-2 small language model (SLM) Our experiments demonstrate substantial improvements over existing question-answering approaches in the telecom domain.
arXiv Detail & Related papers (2024-08-21T17:00:05Z)
How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models [95.44559524735308]
Large language or multimodal model based verification has been proposed to scale up online policing mechanisms for mitigating spread of false and harmful content. We test the limits of improving foundation model performance without continual updating through an initial study of knowledge transfer. Our results on two recent multi-modal fact-checking benchmarks, Mocheg and Fakeddit, indicate that knowledge transfer strategies can improve Fakeddit performance over the state-of-the-art by up to 1.7% and Mocheg performance by up to 2.9%.
arXiv Detail & Related papers (2024-06-29T08:39:07Z)
Personalized Wireless Federated Learning for Large Language Models [75.22457544349668]
Large Language Models (LLMs) have revolutionized natural language processing tasks. Their deployment in wireless networks still face challenges, i.e., a lack of privacy and security protection mechanisms. We introduce two personalized wireless federated fine-tuning methods with low communication overhead.
arXiv Detail & Related papers (2024-04-20T02:30:21Z)
Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication [76.04373033082948]
Large Language Models (LLMs) have recently made significant strides in complex reasoning tasks through the Chain-of-Thought technique. We propose Exchange-of-Thought (EoT), a novel framework that enables cross-model communication during problem-solving.
arXiv Detail & Related papers (2023-12-04T11:53:56Z)
Large Language Models for Telecom: Forthcoming Impact on the Industry [13.456882619578707]
Large Language Models (LLMs), AI-driven models that can achieve general-purpose language understanding and generation, have emerged as a transformative force. We delve into the inner workings of LLMs, providing insights into their current capabilities and limitations. We uncover essential research directions that deal with the distinctive challenges of utilizing the LLMs within the telecom domain.
arXiv Detail & Related papers (2023-08-11T08:41:00Z)
FedYolo: Augmenting Federated Learning with Pretrained Transformers [61.56476056444933]
In this work, we investigate pretrained transformers (PTF) to achieve on-device learning goals. We show that larger scale shrinks the accuracy gaps between alternative approaches and improves robustness. Finally, it enables clients to solve multiple unrelated tasks simultaneously using a single PTF.
arXiv Detail & Related papers (2023-07-10T21:08:52Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
Improving Classifier Training Efficiency for Automatic Cyberbullying Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods. We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments. The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.