GPT-Neo for commonsense reasoning -- a theoretical and practical lens
- URL: http://arxiv.org/abs/2211.15593v2
- Date: Wed, 27 Sep 2023 08:01:39 GMT
- Title: GPT-Neo for commonsense reasoning -- a theoretical and practical lens
- Authors: Rohan Kashyap, Vivek Kashyap, Narendra C.P.
- Abstract summary: We evaluate the performance of the GPT-neo model using $6$ commonsense reasoning benchmark tasks.
We aim to examine the performance of smaller models using the GPT-neo models against several larger model baselines.
- Score: 0.46040036610482665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work has demonstrated substantial gains in pre-training large-language
models (LLMs) followed by supervised fine-tuning on the downstream task. In
this paper, we evaluate the performance of the GPT-neo model using $6$
commonsense reasoning benchmark tasks. We aim to examine the performance of
smaller models using the GPT-neo models against several larger model baselines
such as GPT-$3$, Llama-$2$, MPT and Falcon. Upon fine-tuning with the
appropriate set of hyperparameters, our model achieves competitive accuracy on
several tasks. We also investigate and substantiate our results using
attention-head visualization to better understand the model performance.
Finally, we conduct various robustness tests using various methods to gauge the
model performance under numerous settings.
Related papers
- Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis [89.60263788590893]
Post-training Quantization (PTQ) technique has been extensively adopted for large language models (LLMs) compression.
Existing algorithms focus primarily on performance, overlooking the trade-off among model size, performance, and quantization bitwidth.
arXiv Detail & Related papers (2025-02-18T07:35:35Z) - Distilling foundation models for robust and efficient models in digital pathology [32.99044401004595]
We distilled a large foundation model into a smaller one, reducing the number of parameters by several orders of magnitude.
Our model, H0-mini, achieves nearly comparable performance to large FMs at a significantly reduced inference cost.
It is evaluated on several public benchmarks, achieving 3rd place on the HEST benchmark and 5th place on the EVA benchmark.
arXiv Detail & Related papers (2025-01-27T17:35:39Z) - GPT vs RETRO: Exploring the Intersection of Retrieval and Parameter-Efficient Fine-Tuning [48.71952325015267]
We apply PEFT methods to a modified Retrieval-Enhanced Transformer (RETRO) and a baseline GPT model across several sizes.
We show that RETRO models outperform GPT models in zero-shot settings due to their unique pre-training process.
This work presents the first comprehensive comparison of various PEFT methods integrated with RAG, applied to both GPT and RETRO models.
arXiv Detail & Related papers (2024-07-05T14:16:47Z) - Astraios: Parameter-Efficient Instruction Tuning Code Large Language
Models [21.17021844323919]
We introduce Astraios, a suite of 28 instruction-tuned OctoCoder models using 7 tuning methods and 4 model sizes up to 16 billion parameters.
We find that FFT leads to the best downstream performance across all scales, and PEFT methods differ significantly in their efficacy based on the model scale.
arXiv Detail & Related papers (2024-01-01T15:30:19Z) - PanGu-$\pi$: Enhancing Language Model Architectures via Nonlinearity
Compensation [97.78045712375047]
We present a new efficient model architecture for large language models (LLMs)
We show that PanGu-$pi$-7B can achieve a comparable performance to that of benchmarks with about 10% inference speed-up.
In addition, we have deployed PanGu-$pi$-7B in the high-value domains of finance and law, developing an LLM named YunShan for practical application.
arXiv Detail & Related papers (2023-12-27T11:49:24Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - Revisiting Implicit Models: Sparsity Trade-offs Capability in
Weight-tied Model for Vision Tasks [4.872984658007499]
Implicit models such as Deep Equilibrium Models (DEQs) have garnered significant attention in the community for their ability to train infinite layer models.
We revisit the line of implicit models and trace them back to the original weight-tied models.
Surprisingly, we observe that weight-tied models are more effective, stable, as well as efficient on vision tasks, compared to the DEQ variants.
arXiv Detail & Related papers (2023-07-16T11:45:35Z) - Model soups: averaging weights of multiple fine-tuned models improves
accuracy without increasing inference time [69.7693300927423]
We show that averaging the weights of multiple models fine-tuned with different hyper parameter configurations improves accuracy and robustness.
We show that the model soup approach extends to multiple image classification and natural language processing tasks.
arXiv Detail & Related papers (2022-03-10T17:03:49Z) - Exploring Sparse Expert Models and Beyond [51.90860155810848]
Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost.
We propose a simple method called expert prototyping that splits experts into different prototypes and applies $k$ top-$1$ routing.
This strategy improves the model quality but maintains constant computational costs, and our further exploration on extremely large-scale models reflects that it is more effective in training larger models.
arXiv Detail & Related papers (2021-05-31T16:12:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.