Large Language Models aren't all that you need
- URL: http://arxiv.org/abs/2401.00698v1
- Date: Mon, 1 Jan 2024 08:32:50 GMT
- Title: Large Language Models aren't all that you need
- Authors: Kiran Voderhobli Holla, Chaithanya Kumar, Aryan Singh
- Abstract summary: This paper describes the architecture and systems built towards solving the SemEval 2023 Task 2: MultiCoNER II.
We evaluate two approaches (a) a traditional Random Fields model and (b) a Large Language Model (LLM) fine-tuned with a customized head and compare the two approaches.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes the architecture and systems built towards solving the
SemEval 2023 Task 2: MultiCoNER II (Multilingual Complex Named Entity
Recognition) [1]. We evaluate two approaches (a) a traditional Conditional
Random Fields model and (b) a Large Language Model (LLM) fine-tuned with a
customized head and compare the two approaches. The novel ideas explored are:
1) Decaying auxiliary loss (with residual) - where we train the model on an
auxiliary task of Coarse-Grained NER and include this task as a part of the
loss function 2) Triplet token blending - where we explore ways of blending the
embeddings of neighboring tokens in the final NER layer prior to prediction 3)
Task-optimal heads - where we explore a variety of custom heads and learning
rates for the final layer of the LLM. We also explore multiple LLMs including
GPT-3 and experiment with a variety of dropout and other hyperparameter
settings before arriving at our final model which achieves micro & macro f1 of
0.85/0.84 (on dev) and 0.67/0.61 on the test data . We show that while
pre-trained LLMs, by themselves, bring about a large improvement in scores as
compared to traditional models, we also demonstrate that tangible improvements
to the Macro-F1 score can be made by augmenting the LLM with additional
feature/loss/model engineering techniques described above.
Related papers
- CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives.
Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance.
In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z) - Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - CALRec: Contrastive Alignment of Generative LLMs for Sequential Recommendation [18.986613405565514]
Large Language Models (LLMs) are pretrained on vast corpora of text for sequential recommendation.
We propose a two-stage LLM finetuning framework that finetunes a pretrained LLM in a two-tower fashion using a mixture of two contrastive losses and a language modeling loss.
Our model significantly outperforms many state-of-the-art baselines.
arXiv Detail & Related papers (2024-05-03T18:51:19Z) - EntGPT: Linking Generative Large Language Models with Knowledge Bases [9.067856411512427]
The ability of Large Language Models to generate factually correct output remains relatively unexplored.
We design a three-step hard-prompting method to probe LLMs' ED performance without supervised fine-tuning.
We further improve the knowledge grounding ability through instruction tuning (IT) with similar prompts and responses.
arXiv Detail & Related papers (2024-02-09T19:16:27Z) - A Framework to Implement 1+N Multi-task Fine-tuning Pattern in LLMs
Using the CGC-LORA Algorithm [7.521690071464451]
We propose a unified framework that implements a 1 + N mutli-task fine-tuning pattern in large language models (LLMs)
Our work aims to take an advantage of both MTL (i.e., CGC) and PEFT (i.e., LoRA) scheme.
arXiv Detail & Related papers (2024-01-22T07:58:31Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
Large Language Models [125.91897197446379]
We find that MoE models benefit more from instruction tuning than dense models.
Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks.
arXiv Detail & Related papers (2023-05-24T04:22:26Z) - ZhichunRoad at Amazon KDD Cup 2022: MultiTask Pre-Training for
E-Commerce Product Search [4.220439000486713]
We propose a robust multilingual model to improve the quality of search results.
In pre-training stage, we adopt mlm task, classification task and contrastive learning task.
In fine-tuning stage, we use confident learning, exponential moving average method (EMA), adversarial training (FGM) and regularized dropout strategy (R-Drop)
arXiv Detail & Related papers (2023-01-31T07:31:34Z) - Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network.
We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling.
We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z) - Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language
Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks.
Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients.
We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.