MolEdit: Knowledge Editing for Multimodal Molecule Language Models
- URL: http://arxiv.org/abs/2511.12770v1
- Date: Sun, 16 Nov 2025 20:48:37 GMT
- Title: MolEdit: Knowledge Editing for Multimodal Molecule Language Models
- Authors: Zhenyu Lei, Patrick Soga, Yaochen Zhu, Yinhan He, Yushun Dong, Jundong Li,
- Abstract summary: MolEdit is a framework for molecule-to-caption generation and caption-to-molecule generation.<n>MolEdit combines a Multi-Expert Knowledge Adapter that routes edits to specialized experts for different molecular facets with an Expertise-Aware Editing Switcher.<n>MolEdit delivers up to 18.8% higher Reliability and 12.0% better Locality than baselines while maintaining efficiency.
- Score: 57.85765246726558
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding and continuously refining multimodal molecular knowledge is crucial for advancing biomedicine, chemistry, and materials science. Molecule language models (MoLMs) have become powerful tools in these domains, integrating structural representations (e.g., SMILES strings, molecular graphs) with rich contextual descriptions (e.g., physicochemical properties). However, MoLMs can encode and propagate inaccuracies due to outdated web-mined training corpora or malicious manipulation, jeopardizing downstream discovery pipelines. While knowledge editing has been explored for general-domain AI, its application to MoLMs remains uncharted, presenting unique challenges due to the multifaceted and interdependent nature of molecular knowledge. In this paper, we take the first step toward MoLM editing for two critical tasks: molecule-to-caption generation and caption-to-molecule generation. To address molecule-specific challenges, we propose MolEdit, a powerful framework that enables targeted modifications while preserving unrelated molecular knowledge. MolEdit combines a Multi-Expert Knowledge Adapter that routes edits to specialized experts for different molecular facets with an Expertise-Aware Editing Switcher that activates the adapters only when input closely matches the stored edits across all expertise, minimizing interference with unrelated knowledge. To systematically evaluate editing performance, we introduce MEBench, a comprehensive benchmark assessing multiple dimensions, including Reliability (accuracy of the editing), Locality (preservation of irrelevant knowledge), and Generality (robustness to reformed queries). Across extensive experiments on two popular MoLM backbones, MolEdit delivers up to 18.8% higher Reliability and 12.0% better Locality than baselines while maintaining efficiency. The code is available at: https://github.com/LzyFischer/MolEdit.
Related papers
- KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge [73.51130155601824]
We introduce KnowMol-100K, a large-scale dataset with 100K fine-grained molecular annotations across multiple levels.<n>We also propose chemically-informative molecular representation, effectively addressing limitations in existing molecular representation strategies.<n>KnowMol achieves superior performance across molecular understanding and generation tasks.
arXiv Detail & Related papers (2025-10-22T11:23:58Z) - MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation [14.1130924371878]
MolLangBench is a benchmark designed to evaluate fundamental molecule-language interface tasks.<n>We construct the recognition tasks using automated cheminformatics tools.<n>We curate editing and generation tasks through rigorous expert annotation and validation.
arXiv Detail & Related papers (2025-05-21T03:22:01Z) - Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model [52.84455878597969]
Mol-LLaMA is a large molecular language model that grasps the general knowledge centered on molecules.<n>To improve molecular understanding, we propose a module that integrates complementary information from different molecular encoders.
arXiv Detail & Related papers (2025-02-19T05:49:10Z) - AnyEdit: Edit Any Knowledge Encoded in Language Models [76.28789588247659]
We propose AnyEdit, a new autoregressive editing paradigm for large language models (LLMs)<n>It decomposes long-form knowledge into sequential chunks and iteratively edits the key token in each chunk, ensuring consistent and accurate outputs.<n>It outperforms strong baselines by 21.5% on benchmarks including UnKEBench, AKEW, and our new EditEverything dataset for long-form diverse-formatted knowledge.
arXiv Detail & Related papers (2025-02-08T16:18:37Z) - MolCap-Arena: A Comprehensive Captioning Benchmark on Language-Enhanced Molecular Property Prediction [44.27112553103388]
We present Molecule Caption Arena: the first comprehensive benchmark of large language models (LLMs)augmented molecular property prediction.
We evaluate over twenty LLMs, including both general-purpose and domain-specific molecule captioners, across diverse prediction tasks.
Our findings confirm the ability of LLM-extracted knowledge to enhance state-of-the-art molecular representations.
arXiv Detail & Related papers (2024-11-01T17:03:16Z) - MolX: Enhancing Large Language Models for Molecular Understanding With A Multi-Modal Extension [44.97089022713424]
Large Language Models (LLMs) with their strong task-handling capabilities have shown remarkable advancements across a spectrum of fields.<n>This study seeks to enhance the ability of LLMs to comprehend molecules by equipping them with a multi-modal external module, termed MolX.<n>A hand-crafted molecular fingerprint is incorporated to leverage its embedded domain knowledge.
arXiv Detail & Related papers (2024-06-10T20:25:18Z) - Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model [49.64512917330373]
We introduce a multi-constraint molecular generation large language model, TSMMG, akin to a student.
To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers'
We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements.
arXiv Detail & Related papers (2024-03-20T02:15:55Z) - Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [53.300288393173204]
Large Language Models (LLMs) have shown remarkable performance in various cross-modal tasks.
In this work, we propose an In-context Few-Shot Molecule Learning paradigm for molecule-caption translation.
We evaluate the effectiveness of MolReGPT on molecule-caption translation, including molecule understanding and text-based molecule generation.
arXiv Detail & Related papers (2023-06-11T08:16:25Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.