Related papers: Sensitivity Analysis on Transferred Neural Architectures of BERT and GPT-2 for Financial Sentiment Analysis

Sensitivity Analysis on Transferred Neural Architectures of BERT and GPT-2 for Financial Sentiment Analysis

URL: http://arxiv.org/abs/2207.03037v1
Date: Thu, 7 Jul 2022 01:38:07 GMT
Title: Sensitivity Analysis on Transferred Neural Architectures of BERT and GPT-2 for Financial Sentiment Analysis
Authors: Tracy Qian, Andy Xie, Camille Bruckmann
Abstract summary: We investigate the performance and sensitivity of transferred neural architectures from pre-trained GPT-2 and BERT models. It is also clear that the earlier layers of GPT-2 and BERT contain essential word pattern information that should be maintained.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The explosion in novel NLP word embedding and deep learning techniques has induced significant endeavors into potential applications. One of these directions is in the financial sector. Although there is a lot of work done in state-of-the-art models like GPT and BERT, there are relatively few works on how well these methods perform through fine-tuning after being pre-trained, as well as info on how sensitive their parameters are. We investigate the performance and sensitivity of transferred neural architectures from pre-trained GPT-2 and BERT models. We test the fine-tuning performance based on freezing transformer layers, batch size, and learning rate. We find the parameters of BERT are hypersensitive to stochasticity in fine-tuning and that GPT-2 is more stable in such practice. It is also clear that the earlier layers of GPT-2 and BERT contain essential word pattern information that should be maintained.

Related papers

Selecting Between BERT and GPT for Text Classification in Political Science Research [4.487884986288122]
We evaluate the effectiveness of BERT-based versus GPT-based models in low-data scenarios. We conclude by comparing these approaches in terms of performance, ease of use, and cost.
arXiv Detail & Related papers (2024-11-07T07:29:39Z)
BERT vs GPT for financial engineering [0.0]
The paper benchmarks several Transformer models to show how these models can judge sentiment from a news event. We find that fine-tuned BERT models outperform fine-tuned or vanilla GPT models on this task.
arXiv Detail & Related papers (2024-04-24T11:30:04Z)
Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for Parameter-Efficient BERT [6.029590006321152]
We present Sensi-BERT, a sensitivity driven efficient fine-tuning of BERT models for downstream tasks. Our experiments show the efficacy of Sensi-BERT across different downstream tasks including MNLI, QQP, QNLI, SST-2 and SQuAD.
arXiv Detail & Related papers (2023-07-14T17:24:15Z)
Strong Baselines for Parameter Efficient Few-Shot Fine-tuning [50.83426196335385]
Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase. Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC. Fine-tuning ViTs, however, is expensive in time, compute and storage. This has motivated the design of parameter efficient fine-tuning (PEFT) methods which fine-tune only a fraction of the Transformer's parameters.
arXiv Detail & Related papers (2023-04-04T16:14:39Z)
Exploring Extreme Parameter Compression for Pre-trained Language Models [45.80044281531393]
This work explores larger compression ratios for pre-trained language models (PLMs) Two decomposition and reconstruction protocols are proposed to improve the effectiveness and efficiency during compression. A tiny version achieves $96.7%$ performance of BERT-base with $ 1/48 $ encoder parameters and $2.7 times$ faster on inference.
arXiv Detail & Related papers (2022-05-20T09:16:55Z)
BiBERT: Accurate Fully Binarized BERT [69.35727280997617]
BiBERT is an accurate fully binarized BERT to eliminate the performance bottlenecks. Our method yields impressive 56.3 times and 31.2 times saving on FLOPs and model size.
arXiv Detail & Related papers (2022-03-12T09:46:13Z)
Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing [55.52858954615655]
We conduct a systematic study on fine-tuning stability in biomedical NLP. We show that finetuning performance may be sensitive to pretraining settings, especially in low-resource domains. We show that these techniques can substantially improve fine-tuning performance for lowresource biomedical NLP applications.
arXiv Detail & Related papers (2021-12-15T04:20:35Z)
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive. We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights. Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z)
Kronecker Decomposition for GPT Compression [8.60086973058282]
GPT is an auto-regressive Transformer-based pre-trained language model which has attracted a lot of attention in the natural language processing (NLP) domain. Despite the superior performance of GPT, GPT can be very prohibitive for deploying this model on devices with limited computational power or memory. In this work, we use Kronecker decomposition to compress the linear mappings of the GPT-22 model.
arXiv Detail & Related papers (2021-10-15T15:28:39Z)
TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model. Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z)
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation [84.64004917951547]
Fine-tuning pre-trained language models like BERT has become an effective way in NLP. In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation.
arXiv Detail & Related papers (2020-02-24T16:17:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.