Prompt Engineering: How Prompt Vocabulary affects Domain Knowledge
- URL: http://arxiv.org/abs/2505.17037v1
- Date: Sat, 10 May 2025 08:40:04 GMT
- Title: Prompt Engineering: How Prompt Vocabulary affects Domain Knowledge
- Authors: Dimitri Schreiter,
- Abstract summary: This thesis addresses the problem of whether increasing the specificity of vocabulary in prompts improves domain-specific question-answering and reasoning tasks.<n>We developed a synonymization framework to systematically substitute nouns, verbs, and adjectives with varying specificity levels, measuring the impact on four large language models (LLMs)<n>Our results reveal that while generally increasing the specificity of prompts does not have a significant impact, there appears to be a specificity range, across all considered models, where the LLM performs the best.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Prompt engineering has emerged as a critical component in optimizing large language models (LLMs) for domain-specific tasks. However, the role of prompt specificity, especially in domains like STEM (physics, chemistry, biology, computer science and mathematics), medicine, and law, remains underexplored. This thesis addresses the problem of whether increasing the specificity of vocabulary in prompts improves LLM performance in domain-specific question-answering and reasoning tasks. We developed a synonymization framework to systematically substitute nouns, verbs, and adjectives with varying specificity levels, measuring the impact on four LLMs: Llama-3.1-70B-Instruct, Granite-13B-Instruct-V2, Flan-T5-XL, and Mistral-Large 2, across datasets in STEM, law, and medicine. Our results reveal that while generally increasing the specificity of prompts does not have a significant impact, there appears to be a specificity range, across all considered models, where the LLM performs the best. Identifying this optimal specificity range offers a key insight for prompt design, suggesting that manipulating prompts within this range could maximize LLM performance and lead to more efficient applications in specialized domains.
Related papers
- Diverse Prompts: Illuminating the Prompt Space of Large Language Models with MAP-Elites [2.529560284922988]
This work introduces an evolutionary approach that combines context-free grammar (CFG) with the MAP-Elites algorithm to explore the prompt space.<n>Our method prioritizes quality and diversity, generating high-performing and structurally varied prompts.
arXiv Detail & Related papers (2025-04-19T17:50:34Z) - Way to Specialist: Closing Loop Between Specialized LLM and Evolving Domain Knowledge Graph [66.98553434041708]
Way-to-Specialist (WTS) framework synergizes retrieval-augmented generation with knowledge graphs.<n>"LLM$circlearrowright$KG" paradigm achieves bidirectional enhancement between specialized LLM and domain knowledge graph.
arXiv Detail & Related papers (2024-11-28T11:24:43Z) - Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization [7.522493227357079]
Large Language Models (LLMs) are pre-trained on large-scale corpora.
LLMs suffer from hallucinations, knowledge cut-offs, and lack of knowledge attributions.
We introduce SMART-SLIC, a highly domain-specific LLM framework.
arXiv Detail & Related papers (2024-10-03T17:40:55Z) - Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.<n>We devise a series of experiments to explain the performance gap empirically.
arXiv Detail & Related papers (2024-09-27T05:06:43Z) - MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization [73.7779735046424]
We show that different prompts should be adapted to different Large Language Models (LLM) to enhance their capabilities across various downstream tasks in NLP.
We then propose a model-adaptive prompt (MAPO) method that optimize the original prompts for each specific LLM in downstream tasks.
arXiv Detail & Related papers (2024-07-04T18:39:59Z) - BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks.
Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs.
We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z) - Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains [9.600277231719874]
Large Language Models (LLMs) have demonstrated remarkable proficiency in understanding and generating natural language.
This work explores how to repurpose general LLMs into effective task solvers for specialized domains.
arXiv Detail & Related papers (2024-02-06T20:11:54Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey [100.24095818099522]
Large language models (LLMs) have significantly advanced the field of natural language processing (NLP)
They provide a highly useful, task-agnostic foundation for a wide range of applications.
However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles.
arXiv Detail & Related papers (2023-05-30T03:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.