MolGround: A Benchmark for Molecular Grounding
- URL: http://arxiv.org/abs/2503.23668v4
- Date: Thu, 01 May 2025 01:07:04 GMT
- Title: MolGround: A Benchmark for Molecular Grounding
- Authors: Jiaxin Wu, Ting Zhang, Rubing Chen, Wengyu Zhang, Chen Jason Zhang, Xiao-Yong Wei, Li Qing,
- Abstract summary: We propose a molecular grounding benchmark designed to evaluate a model's referential abilities.<n>We construct the largest molecular understanding benchmark to date, comprising 117k QA pairs.<n>Our system outperforms existing models, including GPT-4o, and its grounding outputs have been integrated to enhance traditional tasks.
- Score: 10.242626234027755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current molecular understanding approaches predominantly focus on the descriptive aspect of human perception, providing broad, topic-level insights. However, the referential aspect -- linking molecular concepts to specific structural components -- remains largely unexplored. To address this gap, we propose a molecular grounding benchmark designed to evaluate a model's referential abilities. We align molecular grounding with established conventions in NLP, cheminformatics, and molecular science, showcasing the potential of NLP techniques to advance molecular understanding within the AI for Science movement. Furthermore, we constructed the largest molecular understanding benchmark to date, comprising 117k QA pairs, and developed a multi-agent grounding prototype as proof of concept. This system outperforms existing models, including GPT-4o, and its grounding outputs have been integrated to enhance traditional tasks such as molecular captioning and ATC (Anatomical, Therapeutic, Chemical) classification.
Related papers
- Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model [55.87790704067848]
Mol-LLaMA is a large molecular language model that grasps the general knowledge centered on molecules.<n>We introduce a module that integrates complementary information from different molecular encoders.<n>Our experimental results demonstrate that Mol-LLaMA is capable of comprehending the general features of molecules.
arXiv Detail & Related papers (2025-02-19T05:49:10Z) - Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)<n>KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.<n>This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - FragNet: A Graph Neural Network for Molecular Property Prediction with Four Layers of Interpretability [0.7499722271664147]
We introduce the FragNet architecture, a graph neural network capable of achieving prediction accuracies comparable to the current state-of-the-art models.
This model enables understanding of which atoms, covalent bonds, molecular fragments, and molecular fragment connections are critical in the prediction of a given molecular property.
The interpretable capabilities of FragNet are key to gaining scientific insights from the model's learned patterns between molecular structure and molecular properties.
arXiv Detail & Related papers (2024-10-16T01:37:01Z) - FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM)
FARM is a foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs.
We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z) - Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model [49.64512917330373]
We introduce a multi-constraint molecular generation large language model, TSMMG, akin to a student.
To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers'
We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements.
arXiv Detail & Related papers (2024-03-20T02:15:55Z) - MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures [2.5563339057415218]
MolIG is a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures.
It amalgamates the strengths of both molecular representation forms.
It exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups.
arXiv Detail & Related papers (2023-11-28T10:28:35Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Domain-Agnostic Molecular Generation with Chemical Feedback [44.063584808910896]
MolGen is a pre-trained molecular language model tailored specifically for molecule generation.
It internalizes structural and grammatical insights through the reconstruction of over 100 million molecular SELFIES.
Our chemical feedback paradigm steers the model away from molecular hallucinations, ensuring alignment between the model's estimated probabilities and real-world chemical preferences.
arXiv Detail & Related papers (2023-01-26T17:52:56Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Geometric Deep Learning on Molecular Representations [0.0]
Geometric deep learning (GDL) is based on neural network architectures that incorporate and process symmetry information.
This review provides a structured and harmonized overview of molecular GDL, highlighting its applications in drug discovery, chemical synthesis prediction, and quantum chemistry.
Emphasis is placed on the relevance of the learned molecular features and their complementarity to well-established molecular descriptors.
arXiv Detail & Related papers (2021-07-26T09:23:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.