Is Transfer Learning Necessary for Protein Landscape Prediction?
- URL: http://arxiv.org/abs/2011.03443v1
- Date: Sat, 31 Oct 2020 20:41:36 GMT
- Title: Is Transfer Learning Necessary for Protein Landscape Prediction?
- Authors: Amir Shanehsazzadeh, David Belanger, David Dohan
- Abstract summary: We show that CNN models trained solely using supervised learning both compete with and sometimes outperform the best models from TAPE.
The benchmarking tasks proposed by TAPE are excellent measures of a model's ability to predict protein function and should be used going forward.
- Score: 14.098875826640883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, there has been great interest in learning how to best represent
proteins, specifically with fixed-length embeddings. Deep learning has become a
popular tool for protein representation learning as a model's hidden layers
produce potentially useful vector embeddings. TAPE introduced a number of
benchmark tasks and showed that semi-supervised learning, via pretraining
language models on a large protein corpus, improved performance on downstream
tasks. Two of the tasks (fluorescence prediction and stability prediction)
involve learning fitness landscapes. In this paper, we show that CNN models
trained solely using supervised learning both compete with and sometimes
outperform the best models from TAPE that leverage expensive pretraining on
large protein datasets. These CNN models are sufficiently simple and small that
they can be trained using a Google Colab notebook. We also find for the
fluorescence task that linear regression outperforms our models and the TAPE
models. The benchmarking tasks proposed by TAPE are excellent measures of a
model's ability to predict protein function and should be used going forward.
However, we believe it is important to add baselines from simple models to put
the performance of the semi-supervised models that have been reported so far
into perspective.
Related papers
- Metalic: Meta-Learning In-Context with Protein Language Models [5.868595531658237]
Machine learning has emerged as a promising technique for such prediction tasks.
Due to data scarcity, we believe meta-learning will play a pivotal role in advancing protein engineering.
arXiv Detail & Related papers (2024-10-10T20:19:35Z) - Evolving Subnetwork Training for Large Language Models [19.54861230097017]
We propose a novel training paradigm: Evolving Subnetwork Training (EST)
EST samplesworks from the layers of the large language model and from commonly used modules within each layer.
We apply EST to train GPT2 model and TinyLlama model, resulting in 26.7% FLOPs saving for GPT2 and 25.0% for TinyLlama without an increase in loss on the pre-training dataset.
arXiv Detail & Related papers (2024-06-11T05:44:56Z) - A Simple and Efficient Baseline for Data Attribution on Images [107.12337511216228]
Current state-of-the-art approaches require a large ensemble of as many as 300,000 models to accurately attribute model predictions.
In this work, we focus on a minimalist baseline, utilizing the feature space of a backbone pretrained via self-supervised learning to perform data attribution.
Our method is model-agnostic and scales easily to large datasets.
arXiv Detail & Related papers (2023-11-03T17:29:46Z) - How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? [92.90857135952231]
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities.
We study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression.
arXiv Detail & Related papers (2023-10-12T15:01:43Z) - INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of
Language Models [40.54353850357839]
We show how we can employ submodular optimization to select highly representative subsets of the training corpora.
We show that the resulting models achieve up to $sim99%$ of the performance of the fully-trained models.
arXiv Detail & Related papers (2023-05-11T09:24:41Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Reprogramming Pretrained Language Models for Protein Sequence
Representation Learning [68.75392232599654]
We propose Representation Learning via Dictionary Learning (R2DL), an end-to-end representation learning framework.
R2DL reprograms a pretrained English language model to learn the embeddings of protein sequences.
Our model can attain better accuracy and significantly improve the data efficiency by up to $105$ times over the baselines set by pretrained and standard supervised methods.
arXiv Detail & Related papers (2023-01-05T15:55:18Z) - Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains.
How do language models of different sizes learn during pre-training?
Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - A Comparison of LSTM and BERT for Small Corpus [0.0]
Recent advancements in the NLP field showed that transfer learning helps with achieving state-of-the-art results for new tasks by tuning pre-trained models instead of starting from scratch.
In this paper we focus on a real-life scenario that scientists in academia and industry face frequently: given a small dataset, can we use a large pre-trained model like BERT and get better results than simple models?
Our experimental results show that bidirectional LSTM models can achieve significantly higher results than a BERT model for a small dataset and these simple models get trained in much less time than tuning the pre-trained counterparts.
arXiv Detail & Related papers (2020-09-11T14:01:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.