FoodChem: A food-chemical relation extraction model
- URL: http://arxiv.org/abs/2110.02019v1
- Date: Tue, 5 Oct 2021 13:07:33 GMT
- Title: FoodChem: A food-chemical relation extraction model
- Authors: Gjorgjina Cenikj, Barbara Korou\v{s}i\'c Seljak and Tome Eftimov
- Abstract summary: We present a new Relation Extraction (RE) model for identifying chemicals present in the composition of food entities.
The BioBERT model achieves the best results, with a macro averaged F1 score of 0.902 in the unbalanced augmentation setting.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we present FoodChem, a new Relation Extraction (RE) model for
identifying chemicals present in the composition of food entities, based on
textual information provided in biomedical peer-reviewed scientific literature.
The RE task is treated as a binary classification problem, aimed at identifying
whether the contains relation exists between a food-chemical entity pair. This
is accomplished by fine-tuning BERT, BioBERT and RoBERTa transformer models.
For evaluation purposes, a novel dataset with annotated contains relations in
food-chemical entity pairs is generated, in a golden and silver version. The
models are integrated into a voting scheme in order to produce the silver
version of the dataset which we use for augmenting the individual models, while
the manually annotated golden version is used for their evaluation. Out of the
three evaluated models, the BioBERT model achieves the best results, with a
macro averaged F1 score of 0.902 in the unbalanced augmentation setting.
Related papers
- FlavorDiffusion: Predicting Food Pairings and Chemical Interactions Using Diffusion Models [0.0]
This paper presents FlavorDiffusion, a novel framework leveraging diffusion models to predict food-chemical interactions and ingredient pairings.
By integrating graph-based embeddings, diffusion processes, and chemical property encoding, FlavorDiffusion addresses data imbalances and enhances clustering quality.
The proposed framework represents a significant step forward in computational gastronomy, offering scalable, interpretable, and chemically informed solutions for food science.
arXiv Detail & Related papers (2025-02-08T06:47:27Z) - Gene-Metabolite Association Prediction with Interactive Knowledge Transfer Enhanced Graph for Metabolite Production [49.814615043389864]
We propose a new task, Gene-Metabolite Association Prediction based on metabolic graphs.
We present the first benchmark containing 2474 metabolites and 1947 genes of two commonly used microorganisms.
Our proposed methodology outperforms baselines by up to 12.3% across various link prediction frameworks.
arXiv Detail & Related papers (2024-10-24T06:54:27Z) - ReacLLaMA: Merging chemical and textual information in chemical
reactivity AI models [0.0]
Chemical reactivity models are developed to predict chemical reaction outcomes in the form of classification (success/failure) or regression (product yield) tasks.
The vast majority of the reported models are trained solely on chemical information such as reactants, products, reagents, and solvents.
Herein incorporation of procedural text with the aim to augment the Graphormer reactivity model and improve its accuracy is presented.
arXiv Detail & Related papers (2024-01-30T18:57:08Z) - Relation Extraction in underexplored biomedical domains: A
diversity-optimised sampling and synthetic data generation approach [0.0]
sparsity of labelled data is an obstacle to the development of Relation Extraction models.
We create the first curated evaluation dataset and extracted literature items from the LOTUS database to build training sets.
We evaluate the performance of standard fine-tuning as a generative task and few-shot learning with open Large Language Models.
arXiv Detail & Related papers (2023-11-10T19:36:00Z) - BioREx: Improving Biomedical Relation Extraction by Leveraging
Heterogeneous Datasets [7.7587371896752595]
Biomedical relation extraction (RE) is a central task in biomedical natural language processing (NLP) research.
We present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset.
Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset.
arXiv Detail & Related papers (2023-06-19T22:48:18Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules.
Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph.
By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - A Distant Supervision Corpus for Extracting Biomedical Relationships
Between Chemicals, Diseases and Genes [35.372588846754645]
ChemDisGene is a new dataset for training and evaluating multi-class multi-label document-level biomedical relation extraction models.
Our dataset contains 80k biomedical research abstracts labeled with mentions of chemicals, diseases, and genes.
arXiv Detail & Related papers (2022-04-13T18:02:05Z) - Scientific Language Models for Biomedical Knowledge Base Completion: An
Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction.
We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Energy-based View of Retrosynthesis [70.66156081030766]
We propose a framework that unifies sequence- and graph-based methods as energy-based models.
We present a novel dual variant within the framework that performs consistent training over Bayesian forward- and backward-prediction.
This model improves state-of-the-art performance by 9.6% for template-free approaches where the reaction type is unknown.
arXiv Detail & Related papers (2020-07-14T18:51:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.