Related papers: Large Language Models aren't all that you need

Large Language Models aren't all that you need

URL: http://arxiv.org/abs/2401.00698v1
Date: Mon, 1 Jan 2024 08:32:50 GMT
Title: Large Language Models aren't all that you need
Authors: Kiran Voderhobli Holla, Chaithanya Kumar, Aryan Singh
Abstract summary: This paper describes the architecture and systems built towards solving the SemEval 2023 Task 2: MultiCoNER II. We evaluate two approaches (a) a traditional Random Fields model and (b) a Large Language Model (LLM) fine-tuned with a customized head and compare the two approaches.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper describes the architecture and systems built towards solving the SemEval 2023 Task 2: MultiCoNER II (Multilingual Complex Named Entity Recognition) [1]. We evaluate two approaches (a) a traditional Conditional Random Fields model and (b) a Large Language Model (LLM) fine-tuned with a customized head and compare the two approaches. The novel ideas explored are: 1) Decaying auxiliary loss (with residual) - where we train the model on an auxiliary task of Coarse-Grained NER and include this task as a part of the loss function 2) Triplet token blending - where we explore ways of blending the embeddings of neighboring tokens in the final NER layer prior to prediction 3) Task-optimal heads - where we explore a variety of custom heads and learning rates for the final layer of the LLM. We also explore multiple LLMs including GPT-3 and experiment with a variety of dropout and other hyperparameter settings before arriving at our final model which achieves micro & macro f1 of 0.85/0.84 (on dev) and 0.67/0.61 on the test data . We show that while pre-trained LLMs, by themselves, bring about a large improvement in scores as compared to traditional models, we also demonstrate that tangible improvements to the Macro-F1 score can be made by augmenting the LLM with additional feature/loss/model engineering techniques described above.

Related papers

Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs [21.541258368039955]
We find that layers of a pretrained large language model (LLM) can be manipulated as separate modules to build a better and even shallower model customized for each test sample.<n>In particular, each layer from the pretrained model can be skipped/pruned or repeated multiple times as recurrent neural networks (RNN), and stacked with others in arbitrary orders, yielding a chain-of-layers (CoLa) per sample.
arXiv Detail & Related papers (2025-07-10T17:59:53Z)
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z)
SHA256 at SemEval-2025 Task 4: Selective Amnesia -- Constrained Unlearning for Large Language Models via Knowledge Isolation [12.838593066237452]
Large language models (LLMs) memorize frequently sensitive information during training, posing risks when deploying publicly accessible models. This paper presents our solution to SemEval-2025 Task 4 on targeted unlearning, which combines causal mediation analysis with layer-specific optimization.
arXiv Detail & Related papers (2025-04-17T15:05:40Z)
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration [49.180693704510006]
Referring Expression (REC) is a cross-modal task that evaluates the interplay of language understanding, image comprehension, and language-to-image grounding.<n>It serves as an essential testing ground for Multimodal Large Language Models (MLLMs)
arXiv Detail & Related papers (2025-02-27T13:58:44Z)
Advancing Single- and Multi-task Text Classification through Large Language Model Fine-tuning [29.782832197148487]
Large language models (LLMs) have been widely used for text classification tasks. This study employed a diverse range of models and methods, varying in size and architecture, and including both fine-tuned and pre-trained approaches. We first assessed the performances of these LLMs on the 20 Newsgroups (20NG) and datasets, comparing them to encoder-only RoBERTa models. We explored the multi-task capabilities of both model types by combining multiple classification tasks, including intent detection and slot-filling, into a single model using data from both datasets.
arXiv Detail & Related papers (2024-12-11T18:06:44Z)
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization [65.64108848398696]
We introduce a preference optimization process to enhance the multimodal reasoning capabilities of MLLMs. We develop a simple yet effective method, termed Mixed Preference Optimization (MPO), which boosts multimodal CoT performance. Our model, InternVL2-8B-MPO, achieves an accuracy of 67.0 on MathVista, outperforming InternVL2-8B by 8.7 points and achieving performance comparable to the 10x larger InternVL2-76B.
arXiv Detail & Related papers (2024-11-15T18:59:27Z)
X2-DFD: A framework for eXplainable and eXtendable Deepfake Detection [55.77552681618732]
X2-DFD is an eXplainable and eXtendable framework based on multimodal large-language models (MLLMs) for deepfake detection.<n>The first stage, Model Feature Assessment, systematically evaluates the detectability of forgery-related features for the MLLM.<n>The second stage, Explainable dataset Construction, consists of two key modules: Strong Feature Strengthening and Weak Feature Supplementing.<n>The third stage, Fine-tuning and Inference, involves fine-tuning the MLLM on the constructed dataset and deploying it for final detection and explanation.
arXiv Detail & Related papers (2024-10-08T15:28:33Z)
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives. Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance. In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z)
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost. We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion. By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z)
CALRec: Contrastive Alignment of Generative LLMs for Sequential Recommendation [18.986613405565514]
Large Language Models (LLMs) are pretrained on vast corpora of text for sequential recommendation. We propose a two-stage LLM finetuning framework that finetunes a pretrained LLM in a two-tower fashion using a mixture of two contrastive losses and a language modeling loss. Our model significantly outperforms many state-of-the-art baselines.
arXiv Detail & Related papers (2024-05-03T18:51:19Z)
EntGPT: Linking Generative Large Language Models with Knowledge Bases [9.067856411512427]
The ability of Large Language Models to generate factually correct output remains relatively unexplored. We design a three-step hard-prompting method to probe LLMs' ED performance without supervised fine-tuning. We further improve the knowledge grounding ability through instruction tuning (IT) with similar prompts and responses.
arXiv Detail & Related papers (2024-02-09T19:16:27Z)
A Framework to Implement 1+N Multi-task Fine-tuning Pattern in LLMs Using the CGC-LORA Algorithm [7.521690071464451]
We propose a unified framework that implements a 1 + N mutli-task fine-tuning pattern in large language models (LLMs) Our work aims to take an advantage of both MTL (i.e., CGC) and PEFT (i.e., LoRA) scheme.
arXiv Detail & Related papers (2024-01-22T07:58:31Z)
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z)
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models [125.91897197446379]
We find that MoE models benefit more from instruction tuning than dense models. Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks.
arXiv Detail & Related papers (2023-05-24T04:22:26Z)
ZhichunRoad at Amazon KDD Cup 2022: MultiTask Pre-Training for E-Commerce Product Search [4.220439000486713]
We propose a robust multilingual model to improve the quality of search results. In pre-training stage, we adopt mlm task, classification task and contrastive learning task. In fine-tuning stage, we use confident learning, exponential moving average method (EMA), adversarial training (FGM) and regularized dropout strategy (R-Drop)
arXiv Detail & Related papers (2023-01-31T07:31:34Z)
Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network. We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling. We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z)
Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks. Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients. We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.