Understanding BLOOM: An empirical study on diverse NLP tasks
- URL: http://arxiv.org/abs/2211.14865v1
- Date: Sun, 27 Nov 2022 15:48:14 GMT
- Title: Understanding BLOOM: An empirical study on diverse NLP tasks
- Authors: Parag Pravin Dakle, SaiKrishna Rallabandi and Preethi Raghavan
- Abstract summary: We present an evaluation of smaller BLOOM model variants on various natural language processing tasks.
BLOOM variants under-perform on all GLUE tasks (except WNLI), question-answering, and text generation.
The variants bloom for WNLI, with an accuracy of 56.3%, and for prompt-based few-shot text extraction on MIT Movies and ATIS datasets.
- Score: 3.884530687475798
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we present an evaluation of smaller BLOOM model variants
(350m/560m and 1b3/1b7) on various natural language processing tasks. This
includes GLUE - language understanding, prompt-based zero-shot and few-shot
text classification and extraction, question answering, prompt-based text
generation, and multi-lingual text classification to understand model
strengths/weaknesses and behavior. Empirical results show that BLOOM variants
under-perform on all GLUE tasks (except WNLI), question-answering, and text
generation. The variants bloom for WNLI, with an accuracy of 56.3%, and for
prompt-based few-shot text extraction on MIT Movies and ATIS datasets. The
BLOOM variants on average have 7% greater accuracy over GPT-2 and GPT-Neo
models on Director and Airline Name extraction from MIT Movies and ATIS
datasets, respectively.
Related papers
- Exploration of Masked and Causal Language Modelling for Text Generation [6.26998839917804]
This paper conducts an extensive comparison of Causal Language Modelling approaches for text generation tasks.
We first employ quantitative metrics and then perform a qualitative human evaluation to analyse coherence and grammatical correctness.
The results show that consistently outperforms CLM in text generation across all datasets.
arXiv Detail & Related papers (2024-05-21T09:33:31Z) - TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale [66.01943465390548]
We introduce TriSum, a framework for distilling large language models' text summarization abilities into a compact, local model.
Our method enhances local model performance on various benchmarks.
It also improves interpretability by providing insights into the summarization rationale.
arXiv Detail & Related papers (2024-03-15T14:36:38Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - Improving Text Embeddings with Large Language Models [59.930513259982725]
We introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps.
We leverage proprietary LLMs to generate diverse synthetic data for hundreds of thousands of text embedding tasks across 93 languages.
Experiments demonstrate that our method achieves strong performance on highly competitive text embedding benchmarks without using any labeled data.
arXiv Detail & Related papers (2023-12-31T02:13:18Z) - Generative Large Language Models Are All-purpose Text Analytics Engines:
Text-to-text Learning Is All Your Need [24.672621081551675]
We formulated 7 key clinical NLP tasks as text-to-text learning and solved them using one unified generative clinical LLM.
The proposed approach achieved state-of-the-art performance for 5 out of 7 major clinical NLP tasks using one unified generative LLM.
arXiv Detail & Related papers (2023-12-11T04:00:26Z) - Token Prediction as Implicit Classification to Identify LLM-Generated
Text [37.89852204279844]
This paper introduces a novel approach for identifying the possible large language models (LLMs) involved in text generation.
Instead of adding an additional classification layer to a base LM, we reframe the classification task as a next-token prediction task.
We utilize the Text-to-Text Transfer Transformer (T5) model as the backbone for our experiments.
arXiv Detail & Related papers (2023-11-15T06:33:52Z) - On Bilingual Lexicon Induction with Large Language Models [81.6546357879259]
We examine the potential of the latest generation of Large Language Models for the development of bilingual lexicons.
We study 1) zero-shot prompting for unsupervised BLI and 2) few-shot in-context prompting with a set of seed translation pairs.
Our work is the first to demonstrate strong BLI capabilities of text-to-text mLLMs.
arXiv Detail & Related papers (2023-10-21T12:43:27Z) - Text Summarization Using Large Language Models: A Comparative Study of
MPT-7b-instruct, Falcon-7b-instruct, and OpenAI Chat-GPT Models [0.0]
Leveraging Large Language Models (LLMs) has shown remarkable promise in enhancing summarization techniques.
This paper embarks on an exploration of text summarization with a diverse set of LLMs, including MPT-7b-instruct, falcon-7b-instruct, and OpenAI ChatGPT text-davinci-003 models.
arXiv Detail & Related papers (2023-10-16T14:33:02Z) - Large Language Model-Aware In-Context Learning for Code Generation [75.68709482932903]
Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation.
We propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation.
arXiv Detail & Related papers (2023-10-15T06:12:58Z) - LAraBench: Benchmarking Arabic AI with Large Language Models [26.249084464525044]
LAraBench addresses this gap for Arabic Natural Language Processing (NLP) and Speech Processing tasks.
We utilize models such as GPT-3.5-turbo, GPT-4, BLOOMZ, Jais-13b-chat, Whisper, and USM to tackle 33 distinct tasks across 61 publicly available datasets.
This involved 98 experimental setups, encompassing 296K data points, 46 hours of speech, and 30 sentences for Text-to-Speech (TTS)
arXiv Detail & Related papers (2023-05-24T10:16:16Z) - TextFlint: Unified Multilingual Robustness Evaluation Toolkit for
Natural Language Processing [73.16475763422446]
We propose a multilingual robustness evaluation platform for NLP tasks (TextFlint)
It incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis.
TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness.
arXiv Detail & Related papers (2021-03-21T17:20:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.