EPICURE Ensemble Pretrained Models for Extracting Cancer Mutations from
Literature
- URL: http://arxiv.org/abs/2106.07722v1
- Date: Fri, 11 Jun 2021 09:08:15 GMT
- Title: EPICURE Ensemble Pretrained Models for Extracting Cancer Mutations from
Literature
- Authors: Jiarun Cao, Elke M van Veen, Niels Peek, Andrew G Renehan, Sophia
Ananiadou
- Abstract summary: EPICURE is an ensemble pre trained model equipped with a conditional random field pattern layer and a span prediction pattern layer to extract cancer mutations from text.
Experimental results on three benchmark datasets show competitive results compared to the baseline models.
- Score: 12.620782629498814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To interpret the genetic profile present in a patient sample, it is necessary
to know which mutations have important roles in the development of the
corresponding cancer type. Named entity recognition is a core step in the text
mining pipeline which facilitates mining valuable cancer information from the
scientific literature. However, due to the scarcity of related datasets,
previous NER attempts in this domain either suffer from low performance when
deep learning based models are deployed, or they apply feature based machine
learning models or rule based models to tackle this problem, which requires
intensive efforts from domain experts, and limit the model generalization
capability. In this paper, we propose EPICURE, an ensemble pre trained model
equipped with a conditional random field pattern layer and a span prediction
pattern layer to extract cancer mutations from text. We also adopt a data
augmentation strategy to expand our training set from multiple datasets.
Experimental results on three benchmark datasets show competitive results
compared to the baseline models.
Related papers
- Combining Domain-Specific Models and LLMs for Automated Disease Phenotyping from Survey Data [0.0]
This pilot study investigated the potential of combining a domain-specific model, BERN2, with large language models (LLMs) to enhance automated phenotyping from research survey data.
We employed BERN2, a named entity recognition and normalization model, to extract information from the ORIGINS survey data.
BERN2 demonstrated high performance in extracting and normalizing disease mentions, and the integration of LLMs, particularly with Few Shot Inference and RAG orchestration, further improved accuracy.
arXiv Detail & Related papers (2024-10-28T02:55:03Z) - Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning.
By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - Meta-Learning on Augmented Gene Expression Profiles for Enhanced Lung Cancer Detection [3.7929238927240685]
We present a meta-learning-based approach for predicting lung cancer from gene expression profiles.
We employ four distinct datasets for the meta-learning tasks, where one as the target dataset and the rest as source datasets.
Results show the superior performance of meta-learning on augmented source data compared to the baselines trained on single datasets.
arXiv Detail & Related papers (2024-08-19T01:39:12Z) - Semantically Rich Local Dataset Generation for Explainable AI in Genomics [0.716879432974126]
Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms.
We propose using Genetic Programming to generate datasets by evolving perturbations in sequences that contribute to their semantic diversity.
arXiv Detail & Related papers (2024-07-03T10:31:30Z) - Graph Neural Networks for Breast Cancer Data Integration [0.0]
We propose a novel learning pipeline comprising three steps - the integration of cancer data modalities as graphs, followed by the application of Graph Neural Networks.
This project has the potential to improve cancer data understanding and encourages the transition of regular data sets to graph-shaped data.
arXiv Detail & Related papers (2022-11-28T17:10:19Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - Adversarial Sample Enhanced Domain Adaptation: A Case Study on
Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation.
adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.