Related papers: Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models

Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models

URL: http://arxiv.org/abs/2308.12272v1
Date: Wed, 23 Aug 2023 17:40:35 GMT
Title: Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models
Authors: Nancy Tyagi, Aidin Shiri, Surjodeep Sarkar, Abhishek Kumar Umrawal, Manas Gaur
Abstract summary: Foundational Language Models (FLMs) have advanced natural language processing (NLP) research. Current researchers are developing larger FLMs to enable contextualized language representation, classification, and generation. While developing larger FLMs has been of significant advantage, it is also a liability concerning hallucination and predictive uncertainty.
Score: 8.902348901281334
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Foundational Language Models (FLMs) have advanced natural language processing (NLP) research. Current researchers are developing larger FLMs (e.g., XLNet, T5) to enable contextualized language representation, classification, and generation. While developing larger FLMs has been of significant advantage, it is also a liability concerning hallucination and predictive uncertainty. Fundamentally, larger FLMs are built on the same foundations as smaller FLMs (e.g., BERT); hence, one must recognize the potential of smaller FLMs which can be realized through an ensemble. In the current research, we perform a reality check on FLMs and their ensemble on benchmark and real-world datasets. We hypothesize that the ensembling of FLMs can influence the individualistic attention of FLMs and unravel the strength of coordination and cooperation of different FLMs. We utilize BERT and define three other ensemble techniques: {Shallow, Semi, and Deep}, wherein the Deep-Ensemble introduces a knowledge-guided reinforcement learning approach. We discovered that the suggested Deep-Ensemble BERT outperforms its large variation i.e. BERTlarge, by a factor of many times using datasets that show the usefulness of NLP in sensitive fields, such as mental health.

Related papers

Tiny language models [0.0]
This study explores whether tiny language models (TLMs) exhibit the same key qualitative features of large language models (LLMs)<n>We demonstrate that TLMs exhibit a clear performance gap between pre-trained and non-pre-trained models across classification tasks.<n>The classification accuracy achieved by a pre-trained deep TLM architecture can be replicated through a soft committee of multiple, independently pre-trained shallow architectures.
arXiv Detail & Related papers (2025-07-20T08:49:57Z)
Feasibility with Language Models for Open-World Compositional Zero-Shot Learning [96.6544564242316]
In Open-World Compositional Zero-Shot Learning, all possible state-object combinations are considered as unseen classes.<n>Our work focuses on using external auxiliary knowledge to determine the feasibility of state-object combinations.
arXiv Detail & Related papers (2025-05-16T12:37:08Z)
DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis [62.31018417955254]
DeepMLF is a novel multimodal language model with learnable tokens tailored toward deep fusion. Our results confirm that deeper fusion leads to better performance, with optimal fusion depths (5-7) exceeding those of existing approaches.
arXiv Detail & Related papers (2025-04-15T11:28:02Z)
Federated In-Context LLM Agent Learning [3.4757641432843487]
Large Language Models (LLMs) have revolutionized intelligent services by enabling logical reasoning, tool use, and interaction with external systems as agents. In this paper, we propose a novel privacy-preserving Federated In-context LLM Agent Learning (FICAL) algorithm. The results show that FICAL has competitive performance compared to other SOTA baselines with a significant communication cost decrease of $mathbf3.33times105$ times.
arXiv Detail & Related papers (2024-12-11T03:00:24Z)
Enhancing LLM's Cognition via Structurization [41.13997892843677]
Large language models (LLMs) process input contexts through a causal and sequential perspective. This paper presents a novel concept of context structurization. Specifically, we transform the plain, unordered contextual sentences into well-ordered and hierarchically structurized elements.
arXiv Detail & Related papers (2024-07-23T12:33:58Z)
ST-LLM: Large Language Models Are Effective Temporal Learners [58.79456373423189]
Large Language Models (LLMs) have showcased impressive capabilities in text comprehension and generation. How to effectively encode and understand videos in video-based dialogue systems remains to be solved. We propose ST-LLM, an effective video-LLM baseline with spatial-temporal sequence modeling inside LLM.
arXiv Detail & Related papers (2024-03-30T10:11:26Z)
When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models [59.84769254832941]
We propose a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp. Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment. Based on FLUB, we investigate the performance of multiple representative and advanced LLMs.
arXiv Detail & Related papers (2024-02-16T22:12:53Z)
Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation [23.736611338497244]
TinyLLM is a new knowledge distillation paradigm to learn a small student LLM from multiple large teacher LLMs. We introduce an in-context example generator and a teacher-forcing Chain-of-Thought strategy to ensure that the rationales are accurate and grounded in contextually appropriate scenarios. Results show that TinyLLM can outperform large teacher LLMs significantly, despite a considerably smaller model size.
arXiv Detail & Related papers (2024-02-07T06:48:24Z)
Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks. The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution. We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios. We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z)
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback [61.83548032416181]
We present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research.
arXiv Detail & Related papers (2023-07-29T18:01:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.