A State-of-the-practice Release-readiness Checklist for Generative AI-based Software Products
- URL: http://arxiv.org/abs/2403.18958v1
- Date: Wed, 27 Mar 2024 19:02:56 GMT
- Title: A State-of-the-practice Release-readiness Checklist for Generative AI-based Software Products
- Authors: Harsh Patel, Dominique Boucher, Emad Fallahzadeh, Ahmed E. Hassan, Bram Adams,
- Abstract summary: This paper investigates the complexities of integrating Large Language Models into software products, with a focus on the challenges encountered for determining their readiness for release.
Our systematic review of grey literature identifies common challenges in deploying LLMs, ranging from pre-training and fine-tuning to user experience considerations.
The study introduces a comprehensive checklist designed to guide practitioners in evaluating key release readiness aspects such as performance, monitoring, and deployment strategies.
- Score: 8.986278918477595
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the complexities of integrating Large Language Models (LLMs) into software products, with a focus on the challenges encountered for determining their readiness for release. Our systematic review of grey literature identifies common challenges in deploying LLMs, ranging from pre-training and fine-tuning to user experience considerations. The study introduces a comprehensive checklist designed to guide practitioners in evaluating key release readiness aspects such as performance, monitoring, and deployment strategies, aiming to enhance the reliability and effectiveness of LLM-based applications in real-world settings.
Related papers
- Towards a Probabilistic Framework for Analyzing and Improving LLM-Enabled Software [0.0]
Large language model (LLM)-enabled systems are a significant challenge in software engineering.
We propose a probabilistic framework for systematically analyzing and improving these systems.
We apply the framework to the autoformalization problem, where natural language documentation is transformed into formal program specifications.
arXiv Detail & Related papers (2025-01-10T22:42:06Z) - The ELEVATE-AI LLMs Framework: An Evaluation Framework for Use of Large Language Models in HEOR: an ISPOR Working Group Report [12.204470166456561]
This article introduces the ELEVATE AI LLMs framework and checklist.
The framework comprises ten evaluation domains, including model characteristics, accuracy, comprehensiveness, and fairness.
Validation of the framework and checklist on studies of systematic literature reviews and health economic modeling highlighted their ability to identify strengths and gaps in reporting.
arXiv Detail & Related papers (2024-12-23T14:09:10Z) - Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products [21.486150701178154]
Large Language Models (LLMs) are increasingly embedded into software products across diverse industries.
This study explores the emerging solutions that software developers are adopting to navigate the encountered challenges.
arXiv Detail & Related papers (2024-10-15T21:11:10Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application [54.984348122105516]
Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework synergizes open-world knowledge with collaborative knowledge.
We propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge.
arXiv Detail & Related papers (2024-05-07T04:00:30Z) - Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study [72.24266814625685]
We explore the performance of large language models (LLMs) across the entire software development lifecycle with DevEval.
DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task.
Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval.
arXiv Detail & Related papers (2024-03-13T15:13:44Z) - RITFIS: Robust input testing framework for LLMs-based intelligent
software [6.439196068684973]
RITFIS is the first framework designed to assess the robustness of intelligent software against natural language inputs.
RITFIS adapts 17 automated testing methods, originally designed for Deep Neural Network (DNN)-based intelligent software.
It demonstrates the effectiveness of RITFIS in evaluating LLM-based intelligent software through empirical validation.
arXiv Detail & Related papers (2024-02-21T04:00:54Z) - A Case Study on Test Case Construction with Large Language Models:
Unveiling Practical Insights and Challenges [2.7029792239733914]
This paper examines the application of Large Language Models in the construction of test cases within the context of software engineering.
Through a blend of qualitative and quantitative analyses, this study assesses the impact of LLMs on test case comprehensiveness, accuracy, and efficiency.
arXiv Detail & Related papers (2023-12-19T20:59:02Z) - Automatically Correcting Large Language Models: Surveying the landscape
of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks.
A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output.
This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z) - A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP)
This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z) - CRITIC: Large Language Models Can Self-Correct with Tool-Interactive
Critiquing [139.77117915309023]
CRITIC allows large language models to validate and amend their own outputs in a manner similar to human interaction with tools.
Comprehensive evaluations involving free-form question answering, mathematical program synthesis, and toxicity reduction demonstrate that CRITIC consistently enhances the performance of LLMs.
arXiv Detail & Related papers (2023-05-19T15:19:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.