Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
- URL: http://arxiv.org/abs/2402.05140v3
- Date: Fri, 26 Jul 2024 01:28:16 GMT
- Title: Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
- Authors: Junhong Shen, Neil Tenenholtz, James Brian Hall, David Alvarez-Melis, Nicolo Fusi,
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable proficiency in understanding and generating natural language.
This work explores how to repurpose general LLMs into effective task solvers for specialized domains.
- Score: 9.600277231719874
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency in understanding and generating natural language. However, their capabilities wane in highly specialized domains underrepresented in the pretraining corpus, such as physical and biomedical sciences. This work explores how to repurpose general LLMs into effective task solvers for specialized domains. We introduce a novel, model-agnostic framework for learning custom input tags, which are parameterized as continuous vectors appended to the LLM's embedding layer, to condition the LLM. We design two types of input tags: domain tags are used to delimit specialized representations (e.g., chemical formulas) and provide domain-relevant context; function tags are used to represent specific functions (e.g., predicting molecular properties) and compress function-solving instructions. We develop a three-stage protocol to learn these tags using auxiliary data and domain knowledge. By explicitly disentangling task domains from task functions, our method enables zero-shot generalization to unseen problems through diverse combinations of the input tags. It also boosts LLM's performance in various specialized domains, such as predicting protein or chemical properties and modeling drug-target interactions, outperforming expert models tailored to these tasks.
Related papers
- Decoding Time Series with LLMs: A Multi-Agent Framework for Cross-Domain Annotation [56.78444462585225]
TESSA is a multi-agent system designed to automatically generate both general and domain-specific annotations for time series data.
General agent captures common patterns and knowledge across multiple source domains, leveraging both time-series-wise and text-wise features.
The domain-specific agent utilizes limited annotations from the target domain to learn domain-specific terminology and generate targeted annotations.
arXiv Detail & Related papers (2024-10-22T22:43:14Z) - Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization [7.522493227357079]
Large Language Models (LLMs) are pre-trained on large-scale corpora.
LLMs suffer from hallucinations, knowledge cut-offs, and lack of knowledge attributions.
We introduce SMART-SLIC, a highly domain-specific LLM framework.
arXiv Detail & Related papers (2024-10-03T17:40:55Z) - More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs [40.54076184225558]
The performance on general tasks decreases after Large Language Models (LLMs) are fine-tuned on domain-specific tasks, known as Catastrophic Forgetting (CF)
This paper presents a challenge for real application of domain-specific LLMs beyond CF, called General Capabilities Integration (GCI)
The objective of GCI is not merely to retain previously acquired general capabilities alongside new domain knowledge, but to harmonize and utilize both sets of skills in a cohesive manner to enhance performance on domain-specific tasks.
arXiv Detail & Related papers (2024-05-28T05:00:12Z) - BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks.
Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs.
We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z) - DRAK: Unlocking Molecular Insights with Domain-Specific Retrieval-Augmented Knowledge in LLMs [6.728130796437259]
Domain-specific Retrieval-Augmented Knowledge (DRAK) is a non-parametric knowledge injection framework for large language models.
DRAK has developed profound expertise in the molecular domain and the capability to handle a broad spectrum of analysis tasks.
Our code will be available soon.
arXiv Detail & Related papers (2024-03-04T15:04:05Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Combining Language Models For Specialized Domains: A Colorful Approach [14.124988885323585]
We introduce a novel approach that integrates domain-specific or secondary LM into general-purpose LM.
This strategy involves labeling, or "coloring", each word to indicate its association with either the general or the domain-specific LM.
We develop an optimized algorithm that enhances the beam search algorithm to effectively handle inferences involving colored words.
arXiv Detail & Related papers (2023-10-30T16:35:55Z) - G-MAP: General Memory-Augmented Pre-trained Language Model for Domain
Tasks [68.87524746922263]
We propose a new framework of General Memory Augmented Pre-trained Language Model (G-MAP)
G-MAP augments the domain-specific PLM by a memory representation built from the frozen general PLM without losing any general knowledge.
We demonstrate the effectiveness of G-MAP on various domains (biomedical and computer science publications, news, and reviews) and different kinds (text classification, QA, NER) of tasks.
arXiv Detail & Related papers (2022-12-07T13:07:24Z) - Set-based Meta-Interpolation for Few-Task Meta-Learning [79.4236527774689]
We propose a novel domain-agnostic task augmentation method, Meta-Interpolation, to densify the meta-training task distribution.
We empirically validate the efficacy of Meta-Interpolation on eight datasets spanning across various domains.
arXiv Detail & Related papers (2022-05-20T06:53:03Z) - KALA: Knowledge-Augmented Language Model Adaptation [65.92457495576141]
We propose a novel domain adaption framework for pre-trained language models (PLMs)
Knowledge-Augmented Language model Adaptation (KALA) modulates the intermediate hidden representations of PLMs with domain knowledge.
Results show that, despite being computationally efficient, our KALA largely outperforms adaptive pre-training.
arXiv Detail & Related papers (2022-04-22T08:11:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.