When SMILES have Language: Drug Classification using Text Classification Methods on Drug SMILES Strings
- URL: http://arxiv.org/abs/2403.12984v2
- Date: Wed, 27 Mar 2024 21:51:03 GMT
- Title: When SMILES have Language: Drug Classification using Text Classification Methods on Drug SMILES Strings
- Authors: Azmine Toushik Wasi, Ĺ erbetar Karlo, Raima Islam, Taki Hasan Rafi, Dong-Kyu Chae,
- Abstract summary: Complex chemical structures, like drugs, are usually defined by SMILES strings as a sequence of molecules and bonds.
Escaping from complex representation, in this work, we pose a single question: What if we treat drug SMILES as conventional sentences and engage in text classification for drug classification?
The study explores the notion of viewing each atom and bond as sentence components, employing basic NLP methods to categorize drug types.
- Score: 5.648318448953635
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Complex chemical structures, like drugs, are usually defined by SMILES strings as a sequence of molecules and bonds. These SMILES strings are used in different complex machine learning-based drug-related research and representation works. Escaping from complex representation, in this work, we pose a single question: What if we treat drug SMILES as conventional sentences and engage in text classification for drug classification? Our experiments affirm the possibility with very competitive scores. The study explores the notion of viewing each atom and bond as sentence components, employing basic NLP methods to categorize drug types, proving that complex problems can also be solved with simpler perspectives. The data and code are available here: https://github.com/azminewasi/Drug-Classification-NLP.
Related papers
- RFL: Simplifying Chemical Structure Recognition with Ring-Free Language [66.47173094346115]
We propose a novel Ring-Free Language (RFL) to describe chemical structures in a hierarchical form.
RFL allows complex molecular structures to be decomposed into multiple parts, ensuring both uniqueness and conciseness.
We propose a universal Molecular Skeleton Decoder (MSD), which comprises a skeleton generation module that progressively predicts the molecular skeleton and individual rings.
arXiv Detail & Related papers (2024-12-10T15:29:32Z) - DrugCLIP: Contrastive Drug-Disease Interaction For Drug Repurposing [4.969453745531116]
DrugCLIP is a contrastive learning method to learn drug and disease's interaction without negative labels.
We have curated a drug repurposing dataset based on real-world clinical trial records.
arXiv Detail & Related papers (2024-07-02T13:41:59Z) - Emerging Opportunities of Using Large Language Models for Translation
Between Drug Molecules and Indications [6.832024637226738]
We propose a new task, which is the translation between drug molecules and corresponding indications.
The creation of molecules from indications, or vice versa, will allow for more efficient targeting of diseases.
arXiv Detail & Related papers (2024-02-14T21:33:13Z) - Empirical Evidence for the Fragment level Understanding on Drug
Molecular Structure of LLMs [16.508471997999496]
We investigate whether and how language models understand the chemical spatial structure from 1D sequences.
The results indicate that language models can understand chemical structures from the perspective of molecular fragments.
arXiv Detail & Related papers (2024-01-15T12:53:58Z) - Compositional Representation of Polymorphic Crystalline Materials [56.80318252233511]
We introduce PCRL, a novel approach that employs probabilistic modeling of composition to capture the diverse polymorphs from available structural information.
Extensive evaluations on sixteen datasets demonstrate the effectiveness of PCRL in learning compositional representation.
arXiv Detail & Related papers (2023-11-17T20:34:28Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - Multi-modal Molecule Structure-text Model for Text-based Retrieval and
Editing [107.49804059269212]
We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions.
In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts.
arXiv Detail & Related papers (2022-12-21T06:18:31Z) - Structured information extraction from complex scientific text with
fine-tuned large language models [55.96705756327738]
We present a simple sequence-to-sequence approach to joint named entity recognition and relation extraction.
The approach leverages a pre-trained large language model (LLM), GPT-3, that is fine-tuned on approximately 500 pairs of prompts.
This approach represents a simple, accessible, and highly-flexible route to obtaining large databases of structured knowledge extracted from unstructured text.
arXiv Detail & Related papers (2022-12-10T07:51:52Z) - Hyperbolic Molecular Representation Learning for Drug Repositioning [19.73556079390888]
A drug hierarchy is a valuable source that encodes knowledge of relations among drugs in a tree-like structure.
Here, we develop a semi-supervised drug embedding that incorporates two sources of information.
We show that the learned drug embedding can induce the hierarchical relations among drugs.
arXiv Detail & Related papers (2022-07-06T20:20:29Z) - Neural networks for Anatomical Therapeutic Chemical (ATC) [83.73971067918333]
We propose combining multiple multi-label classifiers trained on distinct sets of features, including sets extracted from a Bidirectional Long Short-Term Memory Network (BiLSTM)
Experiments demonstrate the power of this approach, which is shown to outperform the best methods reported in the literature.
arXiv Detail & Related papers (2021-01-22T19:49:47Z) - Semi-Supervised Hierarchical Drug Embedding in Hyperbolic Space [19.73556079390888]
A drug hierarchy is a valuable source that encodes human knowledge of drug relations in a tree-like structure.
Here, we develop a semi-supervised drug embedding that incorporates two sources of information.
We show that the learned drug embedding can be used to find new uses for existing drugs and to discover side-effects.
arXiv Detail & Related papers (2020-06-01T14:46:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.