An Efficient Consolidation of Word Embedding and Deep Learning
Techniques for Classifying Anticancer Peptides: FastText+BiLSTM
- URL: http://arxiv.org/abs/2309.12058v1
- Date: Thu, 21 Sep 2023 13:25:11 GMT
- Title: An Efficient Consolidation of Word Embedding and Deep Learning
Techniques for Classifying Anticancer Peptides: FastText+BiLSTM
- Authors: Onur Karakaya and Zeynep Hilal Kilimci
- Abstract summary: Anticancer peptides (ACPs) are peptides with higher degree of selectivity and safety.
Recent scientific advancements generate an interest in peptide-based therapies.
ACPs offer the advantage of efficiently treating intended cells without negatively impacting normal cells.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Anticancer peptides (ACPs) are a group of peptides that exhibite
antineoplastic properties. The utilization of ACPs in cancer prevention can
present a viable substitute for conventional cancer therapeutics, as they
possess a higher degree of selectivity and safety. Recent scientific
advancements generate an interest in peptide-based therapies which offer the
advantage of efficiently treating intended cells without negatively impacting
normal cells. However, as the number of peptide sequences continues to increase
rapidly, developing a reliable and precise prediction model becomes a
challenging task. In this work, our motivation is to advance an efficient model
for categorizing anticancer peptides employing the consolidation of word
embedding and deep learning models. First, Word2Vec and FastText are evaluated
as word embedding techniques for the purpose of extracting peptide sequences.
Then, the output of word embedding models are fed into deep learning approaches
CNN, LSTM, BiLSTM. To demonstrate the contribution of proposed framework,
extensive experiments are carried on widely-used datasets in the literature,
ACPs250 and Independent. Experiment results show the usage of proposed model
enhances classification accuracy when compared to the state-of-the-art studies.
The proposed combination, FastText+BiLSTM, exhibits 92.50% of accuracy for
ACPs250 dataset, and 96.15% of accuracy for Independent dataset, thence
determining new state-of-the-art.
Related papers
- Topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction [0.0]
We propose a topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction.
Our Top-ML model has been validated on two widely used AntiCP 2.0 benchmark datasets and has achieved state-of-the-art performance.
arXiv Detail & Related papers (2024-07-12T04:04:54Z) - A large language model for predicting T cell receptor-antigen binding specificity [4.120928123714289]
We propose a Masked Language Model (MLM) to overcome limitations in model generalization.
Specifically, we randomly masked sequence segments and train tcrLM to infer the masked segment, thereby extract expressive feature from TCR sequences.
Our extensive experimental results demonstrate that tcrLM achieved AUC values of 0.937 and 0.933 on independent test sets and external validation sets.
arXiv Detail & Related papers (2024-06-24T08:36:40Z) - Contrastive learning of T cell receptor representations [11.053778245621544]
We introduce a TCR language model called SCEPTR, capable of data-efficient transfer learning.
We introduce a novel pre-training strategy combining autocontrastive learning and masked-language modelling.
We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity.
arXiv Detail & Related papers (2024-06-10T15:50:45Z) - Regressor-free Molecule Generation to Support Drug Response Prediction [83.25894107956735]
Conditional generation based on the target IC50 score can obtain a more effective sampling space.
Regressor-free guidance combines a diffusion model's score estimation with a regression controller model's gradient based on number labels.
arXiv Detail & Related papers (2024-05-23T13:22:17Z) - Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports [68.39938936308023]
We propose a novel text-guided learning method to achieve highly accurate cancer detection results.
Our approach can leverage clinical knowledge by large-scale pre-trained VLM to enhance generalization ability.
arXiv Detail & Related papers (2024-05-23T07:03:38Z) - ACP-ESM: A novel framework for classification of anticancer peptides
using protein-oriented transformer approach [0.0]
Anticancer peptides (ACPs) are molecules that have gained significant attention in the field of cancer research and therapy.
ACPs are short chains of amino acids, the building blocks of proteins, and they possess the ability to selectively target and kill cancer cells.
ACPs are being investigated as potential candidates for cancer therapy.
arXiv Detail & Related papers (2024-01-04T08:19:27Z) - BeeTLe: A Framework for Linear B-Cell Epitope Prediction and
Classification [0.43512163406551996]
This paper presents a new deep learning-based framework for linear B-cell prediction as well as antibody type-specific classification.
We propose an amino acid encoding method based on eigen decomposition to help the model learn the representations of antibodies.
Experimental results on data curated from the largest public database demonstrate the validity of the proposed methods.
arXiv Detail & Related papers (2023-09-05T09:18:29Z) - Efficient Prediction of Peptide Self-assembly through Sequential and
Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models.
It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv Detail & Related papers (2023-07-17T00:43:33Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research.
Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains.
In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.