Related papers: Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains

Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains

URL: http://arxiv.org/abs/2402.05140v3
Date: Fri, 26 Jul 2024 01:28:16 GMT
Title: Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Authors: Junhong Shen, Neil Tenenholtz, James Brian Hall, David Alvarez-Melis, Nicolo Fusi,
Abstract summary: Large Language Models (LLMs) have demonstrated remarkable proficiency in understanding and generating natural language. This work explores how to repurpose general LLMs into effective task solvers for specialized domains.
Score: 9.600277231719874
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency in understanding and generating natural language. However, their capabilities wane in highly specialized domains underrepresented in the pretraining corpus, such as physical and biomedical sciences. This work explores how to repurpose general LLMs into effective task solvers for specialized domains. We introduce a novel, model-agnostic framework for learning custom input tags, which are parameterized as continuous vectors appended to the LLM's embedding layer, to condition the LLM. We design two types of input tags: domain tags are used to delimit specialized representations (e.g., chemical formulas) and provide domain-relevant context; function tags are used to represent specific functions (e.g., predicting molecular properties) and compress function-solving instructions. We develop a three-stage protocol to learn these tags using auxiliary data and domain knowledge. By explicitly disentangling task domains from task functions, our method enables zero-shot generalization to unseen problems through diverse combinations of the input tags. It also boosts LLM's performance in various specialized domains, such as predicting protein or chemical properties and modeling drug-target interactions, outperforming expert models tailored to these tasks.

Related papers

Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey [39.82566660592583]
Large Language Models (LLMs) have demonstrated remarkable success in various tasks such as natural language understanding, text summarization, and machine translation. Their general-purpose nature often limits their effectiveness in domain-specific applications that require specialized knowledge, such as healthcare, chemistry, or legal analysis. To address this, researchers have explored diverse methods to enhance LLMs by integrating domain-specific knowledge.
arXiv Detail & Related papers (2025-02-15T07:43:43Z)
LLM-Generated Heuristics for AI Planning: Do We Even Need Domain-Independence Anymore? [87.71321254733384]
Large language models (LLMs) can generate planning approaches tailored to specific planning problems. LLMs can achieve state-of-the-art performance on some standard IPC domains. We discuss whether these results signify a paradigm shift and how they can complement existing planning approaches.
arXiv Detail & Related papers (2025-01-30T22:21:12Z)
On Domain-Specific Post-Training for Multimodal Large Language Models [72.67107077850939]
This paper systematically investigates domain adaptation of MLLMs through post-training. We focus on data synthesis, training pipelines, and task evaluation. We conduct experiments in high-impact domains such as biomedicine, food, and remote sensing.
arXiv Detail & Related papers (2024-11-29T18:42:28Z)
Decoding Time Series with LLMs: A Multi-Agent Framework for Cross-Domain Annotation [56.78444462585225]
TESSA is a multi-agent system designed to automatically generate both general and domain-specific annotations for time series data. General agent captures common patterns and knowledge across multiple source domains, leveraging both time-series-wise and text-wise features. The domain-specific agent utilizes limited annotations from the target domain to learn domain-specific terminology and generate targeted annotations.
arXiv Detail & Related papers (2024-10-22T22:43:14Z)
Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization [7.522493227357079]
Large Language Models (LLMs) are pre-trained on large-scale corpora. LLMs suffer from hallucinations, knowledge cut-offs, and lack of knowledge attributions. We introduce SMART-SLIC, a highly domain-specific LLM framework.
arXiv Detail & Related papers (2024-10-03T17:40:55Z)
Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift. We devise a series of experiments to explain the performance gap empirically.
arXiv Detail & Related papers (2024-09-27T05:06:43Z)
More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs [40.54076184225558]
The performance on general tasks decreases after Large Language Models (LLMs) are fine-tuned on domain-specific tasks, known as Catastrophic Forgetting (CF) This paper presents a challenge for real application of domain-specific LLMs beyond CF, called General Capabilities Integration (GCI) The objective of GCI is not merely to retain previously acquired general capabilities alongside new domain knowledge, but to harmonize and utilize both sets of skills in a cohesive manner to enhance performance on domain-specific tasks.
arXiv Detail & Related papers (2024-05-28T05:00:12Z)
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks. Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs. We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z)
DRAK: Unlocking Molecular Insights with Domain-Specific Retrieval-Augmented Knowledge in LLMs [6.728130796437259]
Domain-specific Retrieval-Augmented Knowledge (DRAK) is a non-parametric knowledge injection framework for large language models. DRAK has developed profound expertise in the molecular domain and the capability to handle a broad spectrum of analysis tasks. Our code will be available soon.
arXiv Detail & Related papers (2024-03-04T15:04:05Z)
Knowledge Plugins: Enhancing Large Language Models for Domain-Specific Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE. This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z)
Combining Language Models For Specialized Domains: A Colorful Approach [14.124988885323585]
We introduce a novel approach that integrates domain-specific or secondary LM into general-purpose LM. This strategy involves labeling, or "coloring", each word to indicate its association with either the general or the domain-specific LM. We develop an optimized algorithm that enhances the beam search algorithm to effectively handle inferences involving colored words.
arXiv Detail & Related papers (2023-10-30T16:35:55Z)
G-MAP: General Memory-Augmented Pre-trained Language Model for Domain Tasks [68.87524746922263]
We propose a new framework of General Memory Augmented Pre-trained Language Model (G-MAP) G-MAP augments the domain-specific PLM by a memory representation built from the frozen general PLM without losing any general knowledge. We demonstrate the effectiveness of G-MAP on various domains (biomedical and computer science publications, news, and reviews) and different kinds (text classification, QA, NER) of tasks.
arXiv Detail & Related papers (2022-12-07T13:07:24Z)
Set-based Meta-Interpolation for Few-Task Meta-Learning [79.4236527774689]
We propose a novel domain-agnostic task augmentation method, Meta-Interpolation, to densify the meta-training task distribution. We empirically validate the efficacy of Meta-Interpolation on eight datasets spanning across various domains.
arXiv Detail & Related papers (2022-05-20T06:53:03Z)
KALA: Knowledge-Augmented Language Model Adaptation [65.92457495576141]
We propose a novel domain adaption framework for pre-trained language models (PLMs) Knowledge-Augmented Language model Adaptation (KALA) modulates the intermediate hidden representations of PLMs with domain knowledge. Results show that, despite being computationally efficient, our KALA largely outperforms adaptive pre-training.
arXiv Detail & Related papers (2022-04-22T08:11:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.