Related papers: Maximizing Use-Case Specificity through Precision Model Tuning

Maximizing Use-Case Specificity through Precision Model Tuning

URL: http://arxiv.org/abs/2212.14206v1
Date: Thu, 29 Dec 2022 07:50:14 GMT
Title: Maximizing Use-Case Specificity through Precision Model Tuning
Authors: Pranjali Awasthi, David Recio-Mitter, Yosuke Kyle Sugi
Abstract summary: We present an in-depth analysis of the performance of four transformer-based language models on the task of biomedical information retrieval. Our findings suggest that smaller models, with 10B parameters and fine-tuned on domain-specific datasets, tend to outperform larger language models on highly specific questions.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models have become increasingly popular in recent years for tasks like information retrieval. As use-cases become oriented toward specific domains, fine-tuning becomes default for standard performance. To fine-tune these models for specific tasks and datasets, it is necessary to carefully tune the model's hyperparameters and training techniques. In this paper, we present an in-depth analysis of the performance of four transformer-based language models on the task of biomedical information retrieval. The models we consider are DeepMind's RETRO (7B parameters), GPT-J (6B parameters), GPT-3 (175B parameters), and BLOOM (176B parameters). We compare their performance on the basis of relevance, accuracy, and interpretability, using a large corpus of 480000 research papers on protein structure/function prediction as our dataset. Our findings suggest that smaller models, with <10B parameters and fine-tuned on domain-specific datasets, tend to outperform larger language models on highly specific questions in terms of accuracy, relevancy, and interpretability by a significant margin (+50% on average). However, larger models do provide generally better results on broader prompts.

Related papers

Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining [56.58170370127227]
We show that optimal learning rate follows a power-law relationship with both model parameters and data sizes, while optimal batch size scales primarily with data sizes. This work is the first work that unifies different model shapes and structures, such as Mixture-of-Experts models and dense transformers.
arXiv Detail & Related papers (2025-03-06T18:58:29Z)
Explaining Datasets in Words: Statistical Models with Natural Language Parameters [66.69456696878842]
We introduce a family of statistical models -- including clustering, time series, and classification models -- parameterized by natural language predicates. We apply our framework to a wide range of problems: taxonomizing user chat dialogues, characterizing how they evolve across time, finding categories where one language model is better than the other.
arXiv Detail & Related papers (2024-09-13T01:40:20Z)
Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work. Our empirical investigation includes tens of thousands of models trained with all combinations of threes. We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z)
Under the Hood of Tabular Data Generation Models: the Strong Impact of Hyperparameter Tuning [2.5168710814072894]
This study addresses the practical need for a unified evaluation of models. We propose a reduced search space for each model that allows for quick optimization. For most models, large-scale dataset-specific tuning substantially improves performance compared to the original configurations.
arXiv Detail & Related papers (2024-06-18T07:27:38Z)
Large language models for aspect-based sentiment analysis [0.0]
We assess the performance of GPT-4 and GPT-3.5 in zero shot, few shot and fine-tuned settings. Fine-tuned GPT-3.5 achieves a state-of-the-art F1 score of 83.8 on the joint aspect term extraction and polarity classification task.
arXiv Detail & Related papers (2023-10-27T10:03:21Z)
Fairer and More Accurate Tabular Models Through NAS [14.147928131445852]
We propose using multi-objective Neural Architecture Search (NAS) and Hyperparameter Optimization (HPO) in the first application to the very challenging domain of tabular data. We show that models optimized solely for accuracy with NAS often fail to inherently address fairness concerns. We produce architectures that consistently dominate state-of-the-art bias mitigation methods either in fairness, accuracy or both.
arXiv Detail & Related papers (2023-10-18T17:56:24Z)
Pruning Large Language Models via Accuracy Predictor [0.0]
Large language models (LLMs) containing tens of billions of parameters (or even more) have demonstrated impressive capabilities in various NLP tasks. We propose a novel pruning approach: firstly, a training set of a certain number of architecture-accuracy pairs is established, and then a non-neural model is trained as an accuracy predictor.
arXiv Detail & Related papers (2023-09-18T06:38:24Z)
Text Alignment Is An Efficient Unified Model for Massive NLP Tasks [24.069447197357164]
Next-word prediction is often not an efficient formulation for many NLP tasks. We propose text alignment as an efficient unified model for a wide range of crucial tasks. Our model delivers on par or even superior performance with much smaller model sizes.
arXiv Detail & Related papers (2023-07-06T02:28:31Z)
Scaling Instruction-Finetuned Language Models [126.4789306516927]
Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance. We find that instruction finetuning dramatically improves performance on a variety of model classes.
arXiv Detail & Related papers (2022-10-20T16:58:32Z)
Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data [66.11139091362078]
We provide the first model selection results on large pretrained Transformers from Huggingface using generalization metrics. Despite their niche status, we find that metrics derived from the heavy-tail (HT) perspective are particularly useful in NLP tasks.
arXiv Detail & Related papers (2022-02-06T20:07:35Z)
Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models. We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data. We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z)
Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays [62.997667081978825]
Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses. Current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets. We propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood.
arXiv Detail & Related papers (2020-10-06T04:28:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.