Fine-Grained Chemical Entity Typing with Multimodal Knowledge
Representation
- URL: http://arxiv.org/abs/2108.12899v1
- Date: Sun, 29 Aug 2021 19:41:35 GMT
- Title: Fine-Grained Chemical Entity Typing with Multimodal Knowledge
Representation
- Authors: Chenkai Sun, Weijiang Li, Jinfeng Xiao, Nikolaus Nova Parulian,
ChengXiang Zhai, Heng Ji
- Abstract summary: How to extract detailed knowledge about chemical reactions from the core chemistry literature is a new emerging challenge.
We propose a novel multi-modal representation learning framework to solve the problem of fine-grained chemical entity typing.
Experiment results show that the proposed framework outperforms multiple state-of-the-art methods.
- Score: 36.6963949360594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated knowledge discovery from trending chemical literature is essential
for more efficient biomedical research. How to extract detailed knowledge about
chemical reactions from the core chemistry literature is a new emerging
challenge that has not been well studied. In this paper, we study the new
problem of fine-grained chemical entity typing, which poses interesting new
challenges especially because of the complex name mentions frequently occurring
in chemistry literature and graphic representation of entities. We introduce a
new benchmark data set (CHEMET) to facilitate the study of the new task and
propose a novel multi-modal representation learning framework to solve the
problem of fine-grained chemical entity typing by leveraging external resources
with chemical structures and using cross-modal attention to learn effective
representation of text in the chemistry domain. Experiment results show that
the proposed framework outperforms multiple state-of-the-art methods.
Related papers
- BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction.
Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions.
This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z) - ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area [50.15254966969718]
We introduce textbfChemVLM, an open-source chemical multimodal large language model for chemical applications.
ChemVLM is trained on a carefully curated bilingual dataset that enhances its ability to understand both textual and visual chemical information.
We benchmark ChemVLM against a range of open-source and proprietary multimodal large language models on various tasks.
arXiv Detail & Related papers (2024-08-14T01:16:40Z) - CEAR: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature [4.086092284014203]
We propose a methodology that involves augmenting existing annotated text corpora with knowledge from Chebi and fine-tuning a large model (LLM) to recognize chemical entities and their roles in scientific text.
By combining ontological knowledge understanding capabilities of LLMs, we achieve high precision and recall rates in identifying both the chemical entities and roles in scientific literature.
arXiv Detail & Related papers (2024-07-31T15:56:06Z) - An Autonomous Large Language Model Agent for Chemical Literature Data
Mining [60.85177362167166]
We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature.
Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - Predictive Chemistry Augmented with Text Retrieval [37.59545092901872]
We introduce TextReact, a novel method that directly augments predictive chemistry with texts retrieved from the literature.
TextReact retrieves text descriptions relevant for a given chemical reaction, and then aligns them with the molecular representation of the reaction.
We empirically validate the framework on two chemistry tasks: reaction condition recommendation and one-step retrosynthesis.
arXiv Detail & Related papers (2023-12-08T07:40:59Z) - Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [57.70772230913099]
Chemist-X automates the reaction condition recommendation (RCR) task in chemical synthesis with retrieval-augmented generation (RAG) technology.
Chemist-X interrogates online molecular databases and distills critical data from the latest literature database.
Chemist-X considerably reduces chemists' workload and allows them to focus on more fundamental and creative problems.
arXiv Detail & Related papers (2023-11-16T01:21:33Z) - ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision [27.850325653751078]
structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design.
Despite the importance of extracting structured reactions from scientific literature, data annotation for this purpose is cost-prohibitive due to the significant labor required from domain experts.
We propose ReactIE, which combines two weakly supervised approaches for pre-training. Our method utilizes frequent patterns within the text as linguistic cues to identify specific characteristics of chemical reactions.
arXiv Detail & Related papers (2023-07-04T02:52:30Z) - Bridging the Gap between Chemical Reaction Pretraining and Conditional
Molecule Generation with a Unified Model [3.3031562864527664]
We propose a unified framework that addresses both the reaction representation learning and molecule generation tasks.
Inspired by the organic chemistry mechanism, we develop a novel pretraining framework that enables us to incorporate inductive biases into the model.
Our framework achieves state-of-the-art results on challenging downstream tasks.
arXiv Detail & Related papers (2023-03-13T10:06:41Z) - ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones.
We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z) - BERT Learns (and Teaches) Chemistry [5.653789128055942]
We propose the use of attention to study functional groups and other property-impacting molecular substructures from a data-driven perspective.
We then apply the representations of functional groups and atoms learned by the model to tackle problems of toxicity, solubility, drug-likeness, and accessibility.
arXiv Detail & Related papers (2020-07-11T00:23:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.