Vortex under Ripplet: An Empirical Study of RAG-enabled Applications
- URL: http://arxiv.org/abs/2407.05138v1
- Date: Sat, 6 Jul 2024 17:25:11 GMT
- Title: Vortex under Ripplet: An Empirical Study of RAG-enabled Applications
- Authors: Yuchen Shao, Yuheng Huang, Jiawei Shen, Lei Ma, Ting Su, Chengcheng Wan,
- Abstract summary: Large language models (LLMs) enhanced by retrieval-augmented generation (RAG) provide effective solutions in various application scenarios.
We manually studied 100 open-source applications that incorporate RAG-enhanced LLMs, and their issue reports.
We have found that more than 98% of applications contain multiple integration defects that harm software functionality, efficiency, and security.
- Score: 6.588605888228515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) enhanced by retrieval-augmented generation (RAG) provide effective solutions in various application scenarios. However, developers face challenges in integrating RAG-enhanced LLMs into software systems, due to lack of interface specification, requirements from software context, and complicated system management. In this paper, we manually studied 100 open-source applications that incorporate RAG-enhanced LLMs, and their issue reports. We have found that more than 98% of applications contain multiple integration defects that harm software functionality, efficiency, and security. We have also generalized 19 defect patterns and proposed guidelines to tackle them. We hope this work could aid LLM-enabled software development and motivate future research.
Related papers
- Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework [58.36391985790157]
In real world software development, improper or missing exception handling can severely impact the robustness and reliability of code.
We explore the use of large language models (LLMs) to improve exception handling in code.
We propose Seeker, a multi-agent framework inspired by expert developer strategies for exception handling.
arXiv Detail & Related papers (2024-12-16T12:35:29Z) - A Real-World Benchmark for Evaluating Fine-Grained Issue Solving Capabilities of Large Language Models [11.087034068992653]
FAUN-Eval is a benchmark specifically designed to evaluate the Fine-grAined issUe solviNg capabilities of LLMs.
It is constructed using a dataset curated from 30 well-known GitHub repositories.
We evaluate ten LLMs with FAUN-Eval, including four closed-source and six open-source models.
arXiv Detail & Related papers (2024-11-27T03:25:44Z) - Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [62.94719119451089]
Lingma SWE-GPT series learns from and simulating real-world code submission activities.
Lingma SWE-GPT 72B resolves 30.20% of GitHub issues, marking a significant improvement in automatic issue resolution.
arXiv Detail & Related papers (2024-11-01T14:27:16Z) - Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products [21.486150701178154]
Large Language Models (LLMs) are increasingly embedded into software products across diverse industries.
This study explores the emerging solutions that software developers are adopting to navigate the encountered challenges.
arXiv Detail & Related papers (2024-10-15T21:11:10Z) - Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach [54.03528377384397]
In real world software development, improper or missing exception handling can severely impact the robustness and reliability of code.
We explore the use of large language models (LLMs) to improve exception handling in code.
We propose Seeker, a multi agent framework inspired by expert developer strategies for exception handling.
arXiv Detail & Related papers (2024-10-09T14:45:45Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - A State-of-the-practice Release-readiness Checklist for Generative AI-based Software Products [8.986278918477595]
This paper investigates the complexities of integrating Large Language Models into software products, with a focus on the challenges encountered for determining their readiness for release.
Our systematic review of grey literature identifies common challenges in deploying LLMs, ranging from pre-training and fine-tuning to user experience considerations.
The study introduces a comprehensive checklist designed to guide practitioners in evaluating key release readiness aspects such as performance, monitoring, and deployment strategies.
arXiv Detail & Related papers (2024-03-27T19:02:56Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - LLM4SecHW: Leveraging Domain Specific Large Language Model for Hardware
Debugging [4.297043877989406]
This paper presents a novel framework for hardware debug that leverages domain specific Large Language Model (LLM)
We propose a unique approach to compile a dataset of open source hardware design defects and their remediation steps.
LLM4SecHW employs fine tuning of medium sized LLMs based on this dataset, enabling the identification and rectification of bugs in hardware designs.
arXiv Detail & Related papers (2024-01-28T19:45:25Z) - Large Language Models for Software Engineering: Survey and Open Problems [35.29302720251483]
This paper provides a survey of the emerging area of Large Language Models (LLMs) for Software Engineering (SE)
Our survey reveals the pivotal role that hybrid techniques (traditional SE plus LLMs) have to play in the development and deployment of reliable, efficient and effective LLM-based SE.
arXiv Detail & Related papers (2023-10-05T13:33:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.