Related papers: DRAK: Unlocking Molecular Insights with Domain-Specific Retrieval-Augmented Knowledge in LLMs

DRAK: Unlocking Molecular Insights with Domain-Specific Retrieval-Augmented Knowledge in LLMs

URL: http://arxiv.org/abs/2406.18535v1
Date: Mon, 4 Mar 2024 15:04:05 GMT
Title: DRAK: Unlocking Molecular Insights with Domain-Specific Retrieval-Augmented Knowledge in LLMs
Authors: Jinzhe Liu, Xiangsheng Huang, Zhuo Chen, Yin Fang,
Abstract summary: Domain-specific Retrieval-Augmented Knowledge (DRAK) is a non-parametric knowledge injection framework for large language models. DRAK has developed profound expertise in the molecular domain and the capability to handle a broad spectrum of analysis tasks. Our code will be available soon.
Score: 6.728130796437259
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) encounter challenges with the unique syntax of specific domains, such as biomolecules. Existing fine-tuning or modality alignment techniques struggle to bridge the domain knowledge gap and understand complex molecular data, limiting LLMs' progress in specialized fields. To overcome these limitations, we propose an expandable and adaptable non-parametric knowledge injection framework named Domain-specific Retrieval-Augmented Knowledge (DRAK), aimed at enhancing reasoning capabilities in specific domains. Utilizing knowledge-aware prompts and gold label-induced reasoning, DRAK has developed profound expertise in the molecular domain and the capability to handle a broad spectrum of analysis tasks. We evaluated two distinct forms of DRAK variants, proving that DRAK exceeds previous benchmarks on six molecular tasks within the Mol-Instructions dataset. Extensive experiments have underscored DRAK's formidable performance and its potential to unlock molecular insights, offering a unified paradigm for LLMs to tackle knowledge-intensive tasks in specific domains. Our code will be available soon.

Related papers

Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey [39.82566660592583]
Large Language Models (LLMs) have demonstrated remarkable success in various tasks such as natural language understanding, text summarization, and machine translation. Their general-purpose nature often limits their effectiveness in domain-specific applications that require specialized knowledge, such as healthcare, chemistry, or legal analysis. To address this, researchers have explored diverse methods to enhance LLMs by integrating domain-specific knowledge.
arXiv Detail & Related papers (2025-02-15T07:43:43Z)
Way to Specialist: Closing Loop Between Specialized LLM and Evolving Domain Knowledge Graph [66.98553434041708]
Way-to-Specialist (WTS) framework synergizes retrieval-augmented generation with knowledge graphs. "LLM$circlearrowright$KG" paradigm achieves bidirectional enhancement between specialized LLM and domain knowledge graph.
arXiv Detail & Related papers (2024-11-28T11:24:43Z)
MolCap-Arena: A Comprehensive Captioning Benchmark on Language-Enhanced Molecular Property Prediction [44.27112553103388]
We present Molecule Caption Arena: the first comprehensive benchmark of large language models (LLMs)augmented molecular property prediction. We evaluate over twenty LLMs, including both general-purpose and domain-specific molecule captioners, across diverse prediction tasks. Our findings confirm the ability of LLM-extracted knowledge to enhance state-of-the-art molecular representations.
arXiv Detail & Related papers (2024-11-01T17:03:16Z)
GIVE: Structured Reasoning with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning framework that integrates the parametric and non-parametric memories. Our method facilitates a more logical and step-wise reasoning approach akin to experts' problem-solving, rather than gold answer retrieval.
arXiv Detail & Related papers (2024-10-11T03:05:06Z)
Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization [7.522493227357079]
Large Language Models (LLMs) are pre-trained on large-scale corpora. LLMs suffer from hallucinations, knowledge cut-offs, and lack of knowledge attributions. We introduce SMART-SLIC, a highly domain-specific LLM framework.
arXiv Detail & Related papers (2024-10-03T17:40:55Z)
Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift. We devise a series of experiments to empirically explain the performance gap.
arXiv Detail & Related papers (2024-09-27T05:06:43Z)
Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models [12.744381867301353]
We propose a novel Molecular Graph representation learning framework that integrates Large language models and Domain-specific small models. We employ a multi-modal alignment method to coordinate various modalities, including molecular graphs and their corresponding descriptive texts, to guide the pre-training of molecular representations.
arXiv Detail & Related papers (2024-08-19T16:11:59Z)
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains [9.600277231719874]
Large Language Models (LLMs) have demonstrated remarkable proficiency in understanding and generating natural language. This work explores how to repurpose general LLMs into effective task solvers for specialized domains.
arXiv Detail & Related papers (2024-02-06T20:11:54Z)
Knowledge Plugins: Enhancing Large Language Models for Domain-Specific Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE. This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z)
Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [53.300288393173204]
Large Language Models (LLMs) have shown remarkable performance in various cross-modal tasks. In this work, we propose an In-context Few-Shot Molecule Learning paradigm for molecule-caption translation. We evaluate the effectiveness of MolReGPT on molecule-caption translation, including molecule understanding and text-based molecule generation.
arXiv Detail & Related papers (2023-06-11T08:16:25Z)
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey [100.24095818099522]
Large language models (LLMs) have significantly advanced the field of natural language processing (NLP) They provide a highly useful, task-agnostic foundation for a wide range of applications. However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles.
arXiv Detail & Related papers (2023-05-30T03:00:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.