Comparison of Open-Source and Proprietary LLMs for Machine Reading Comprehension: A Practical Analysis for Industrial Applications
- URL: http://arxiv.org/abs/2406.13713v2
- Date: Fri, 06 Dec 2024 21:24:16 GMT
- Title: Comparison of Open-Source and Proprietary LLMs for Machine Reading Comprehension: A Practical Analysis for Industrial Applications
- Authors: Mahaman Sanoussi Yahaya Alassan, Jessica López Espejel, Merieme Bouhandi, Walid Dahhane, El Hassane Ettifouri,
- Abstract summary: Large Language Models (LLMs) have recently demonstrated remarkable performance in various Natural Language Processing (NLP) applications.<n>This article presents a comparative analysis between open-source LLMs and proprietary models on this task.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models (LLMs) have recently demonstrated remarkable performance in various Natural Language Processing (NLP) applications, such as sentiment analysis, content generation, and personalized recommendations. Despite their impressive capabilities, there remains a significant need for systematic studies concerning the practical application of LLMs in industrial settings, as well as the specific requirements and challenges related to their deployment in these contexts. This need is particularly critical for Machine Reading Comprehension (MCR), where factual, concise, and accurate responses are required. To date, most MCR rely on Small Language Models (SLMs) or Recurrent Neural Networks (RNNs) such as Long Short-Term Memory (LSTM). This trend is evident in the SQuAD2.0 rankings on the Papers with Code table. This article presents a comparative analysis between open-source LLMs and proprietary models on this task, aiming to identify light and open-source alternatives that offer comparable performance to proprietary models.
Related papers
- A Framework for Using LLMs for Repository Mining Studies in Empirical Software Engineering [12.504438766461027]
Large Language Models (LLMs) have transformed Software Engineering (SE) by providing innovative methods for analyzing software repositories.
Our research packages a framework, coined Prompt Refinement and Insights for Mining Empirical Software repositories (PRIMES)
Our findings indicate that standardizing prompt engineering and using PRIMES can enhance the reliability and accuracy of studies utilizing LLMs.
arXiv Detail & Related papers (2024-11-15T06:08:57Z) - Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output [49.893971654861424]
We present a light-weight approach for detecting nonfactual outputs from retrieval-augmented generation (RAG)
We compute a factuality score that can be thresholded to yield a binary decision.
Our experiments show high area under the ROC curve (AUC) across a wide range of relevant open source datasets.
arXiv Detail & Related papers (2024-11-01T20:44:59Z) - A Survey of Small Language Models [104.80308007044634]
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources.
We present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques.
arXiv Detail & Related papers (2024-10-25T23:52:28Z) - EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.
We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.
Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z) - FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications.
FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems [67.52782366565658]
State-of-the-art recommender systems (RSs) depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables.
Despite the prosperity of lightweight embedding-based RSs, a wide diversity is seen in evaluation protocols.
This study investigates various LERS' performance, efficiency, and cross-task transferability via a thorough benchmarking process.
arXiv Detail & Related papers (2024-06-25T07:45:00Z) - Evaluating the Efficacy of Open-Source LLMs in Enterprise-Specific RAG Systems: A Comparative Study of Performance and Scalability [0.0]
This paper presents an analysis of open-source large language models (LLMs) and their application in Retrieval-Augmented Generation (RAG) tasks.
Our findings indicate that open-source LLMs, combined with effective embedding techniques, can significantly improve the accuracy and efficiency of RAG systems.
arXiv Detail & Related papers (2024-06-17T11:22:25Z) - Efficient Prompting for LLM-based Generative Internet of Things [88.84327500311464]
Large language models (LLMs) have demonstrated remarkable capacities on various tasks, and integrating the capacities of LLMs into the Internet of Things (IoT) applications has drawn much research attention recently.
Due to security concerns, many institutions avoid accessing state-of-the-art commercial LLM services, requiring the deployment and utilization of open-source LLMs in a local network setting.
We propose a LLM-based Generative IoT (GIoT) system deployed in the local network setting in this study.
arXiv Detail & Related papers (2024-06-14T19:24:00Z) - Assessing the Performance of Chinese Open Source Large Language Models in Information Extraction Tasks [12.400599440431188]
Information Extraction (IE) plays a crucial role in Natural Language Processing (NLP)
Recent experiments focusing on English IE tasks have shed light on the challenges faced by Large Language Models (LLMs) in achieving optimal performance.
arXiv Detail & Related papers (2024-06-04T08:00:40Z) - Assessing and Verifying Task Utility in LLM-Powered Applications [28.41607905656699]
Large Language Models (LLMs) have led to a surge in applications that facilitate collaboration among agents, assisting humans in their daily tasks.
This highlights the need to verify utility of LLM-powered applications, particularly by ensuring alignment between the application's functionality and end-user needs.
We introduce AgentEval, a novel framework designed to simplify the utility verification process by automatically proposing a set of criteria tailored to the unique purpose of any given application.
arXiv Detail & Related papers (2024-05-03T15:26:27Z) - Automating Customer Needs Analysis: A Comparative Study of Large Language Models in the Travel Industry [2.4244694855867275]
Large Language Models (LLMs) have emerged as powerful tools for extracting valuable insights from vast amounts of textual data.
In this study, we conduct a comparative analysis of LLMs for the extraction of travel customer needs from TripAdvisor posts.
Our findings highlight the efficacy of opensource LLMs, particularly Mistral 7B, in achieving comparable performance to larger closed models.
arXiv Detail & Related papers (2024-04-27T18:28:10Z) - Towards Pareto Optimal Throughput in Small Language Model Serving [4.497936996651617]
Small Language Models (SLMs) offer new opportunities for resource-constrained users.
We present a set of experiments designed to benchmark SLM inference at performance and energy levels.
arXiv Detail & Related papers (2024-04-04T10:45:07Z) - A Review of Multi-Modal Large Language and Vision Models [1.9685736810241874]
Large Language Models (LLMs) have emerged as a focal point of research and application.
Recently, LLMs have been extended into multi-modal large language models (MM-LLMs)
This paper provides an extensive review of the current state of those LLMs with multi-modal capabilities as well as the very recent MM-LLMs.
arXiv Detail & Related papers (2024-03-28T15:53:45Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Large Language Models for Data Annotation and Synthesis: A Survey [49.8318827245266]
This survey focuses on the utility of Large Language Models for data annotation and synthesis.
It includes an in-depth taxonomy of data types that LLMs can annotate, a review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation and synthesis.
arXiv Detail & Related papers (2024-02-21T00:44:04Z) - Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models [33.50873478562128]
Large Language Models (LLMs) bring forth challenges in the high consumption of computational, memory, energy, and financial resources.
This survey aims to systematically address these challenges by reviewing a broad spectrum of techniques designed to enhance the resource efficiency of LLMs.
arXiv Detail & Related papers (2024-01-01T01:12:42Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models [11.845239346943067]
parameter-efficient fine-tuning (PEFT) is a promising approach to efficiently specialize large language models (LLMs) to task-specific data.
Our study highlights the potential for tuning larger LLMs and significant reductions in memory usage by combining PEFT with quantization.
arXiv Detail & Related papers (2023-08-21T04:31:06Z) - A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP)
This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.