Related papers: Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

URL: http://arxiv.org/abs/2405.19888v1
Date: Thu, 30 May 2024 09:46:36 GMT
Title: Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Authors: Chaofan Lin, Zhenhua Han, Chengruidong Zhang, Yuqing Yang, Fan Yang, Chen Chen, Lili Qiu,
Abstract summary: Parrot is a service system that focuses on the end-to-end experience of LLM-based applications. A Semantic Variable annotates an input/output variable in the prompt of a request, and creates the data pipeline when connecting multiple LLM requests.
Score: 11.894203842968745
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rise of large language models (LLMs) has enabled LLM-based applications (a.k.a. AI agents or co-pilots), a new software paradigm that combines the strength of LLM and conventional software. Diverse LLM applications from different tenants could design complex workflows using multiple LLM requests to accomplish one task. However, they have to use the over-simplified request-level API provided by today's public LLM services, losing essential application-level information. Public LLM services have to blindly optimize individual LLM requests, leading to sub-optimal end-to-end performance of LLM applications. This paper introduces Parrot, an LLM service system that focuses on the end-to-end experience of LLM-based applications. Parrot proposes Semantic Variable, a unified abstraction to expose application-level knowledge to public LLM services. A Semantic Variable annotates an input/output variable in the prompt of a request, and creates the data pipeline when connecting multiple LLM requests, providing a natural way to program LLM applications. Exposing Semantic Variables to the public LLM service allows it to perform conventional data flow analysis to uncover the correlation across multiple LLM requests. This correlation opens a brand-new optimization space for the end-to-end performance of LLM-based applications. Extensive evaluations demonstrate that Parrot can achieve up to an order-of-magnitude improvement for popular and practical use cases of LLM applications.

Related papers

VFLAIR-LLM: A Comprehensive Framework and Benchmark for Split Learning of LLMs [8.920340856412087]
VFLAIR-LLM is a lightweight split learning framework for Large Language Models.<n>We benchmark 5 attacks and 9 defenses under various Split Learning for LLM(SL-LLM) settings.
arXiv Detail & Related papers (2025-08-05T05:20:33Z)
REALM: A Dataset of Real-World LLM Use Cases [69.57194370666876]
REALM is a dataset of over 94,000 Large Language Models (LLMs) use cases collected from Reddit and news articles.<n>It captures two key dimensions: the diverse applications of LLMs and the demographics of their users.<n>It categorizes LLM applications and explores how users' occupations relate to the types of applications they use.
arXiv Detail & Related papers (2025-03-24T15:39:25Z)
LLMs can see and hear without any training [63.964888082106974]
MILS is a simple, training-free approach to imbue multimodal capabilities into your favorite LLM. We establish a new state-of-the-art on emergent zero-shot image, video and audio captioning. Being a gradient-free optimization approach, MILS can invert multimodal embeddings into text.
arXiv Detail & Related papers (2025-01-30T02:16:35Z)
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning [19.68349294206012]
We propose a training-free adaptive inference method for multi-modal LLMs. With a minimalist design, our method can be applied to both video and image LLMs. Under a similar computational cost, our method outperforms the state-of-the-art methods in long video understanding.
arXiv Detail & Related papers (2024-12-04T11:47:57Z)
SoupLM: Model Integration in Large Language and Multi-Modal Models [51.12227693121004]
Training large language models (LLMs) requires significant computing resources. Existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks.
arXiv Detail & Related papers (2024-07-11T05:38:15Z)
Efficient Prompting for LLM-based Generative Internet of Things [88.84327500311464]
Large language models (LLMs) have demonstrated remarkable capacities on various tasks, and integrating the capacities of LLMs into the Internet of Things (IoT) applications has drawn much research attention recently. Due to security concerns, many institutions avoid accessing state-of-the-art commercial LLM services, requiring the deployment and utilization of open-source LLMs in a local network setting. We propose a LLM-based Generative IoT (GIoT) system deployed in the local network setting in this study.
arXiv Detail & Related papers (2024-06-14T19:24:00Z)
Large Language Models as Software Components: A Taxonomy for LLM-Integrated Applications [0.0]
Large Language Models (LLMs) have become widely adopted recently. Research explores their use both as autonomous agents and as tools for software engineering. LLMs-integrated applications, on the other hand, are software systems that leverage an LLM to perform tasks that would otherwise be impossible or require significant coding effort. This study provides a taxonomy for LLM-integrated applications, offering a framework for analyzing and describing these systems.
arXiv Detail & Related papers (2024-06-13T21:32:56Z)
A Practice-Friendly LLM-Enhanced Paradigm with Preference Parsing for Sequential Recommendation [15.153844486572932]
This paper proposes a practice-friendly LLM-enhanced paradigm with preference parsing (P2Rec) for sequential recommender systems (SRS) Specifically, in the information reconstruction stage, we design a new user-level SFT task for collaborative information injection with the assistance of a pre-trained SRS model. Our goal is to let LLM learn to reconstruct a corresponding prior preference distribution from each user's interaction sequence.
arXiv Detail & Related papers (2024-06-01T07:18:56Z)
Optimizing LLM Queries in Relational Workloads [58.254894049950366]
We show how to optimize Large Language Models (LLMs) inference for analytical workloads that invoke LLMs within relational queries. We implement these optimizations in Apache Spark, with vLLM as the model serving backend. We achieve up to 4.4x improvement in end-to-end latency on a benchmark of diverse LLM-based queries on real datasets.
arXiv Detail & Related papers (2024-03-09T07:01:44Z)
Mutual Enhancement of Large and Small Language Models with Cross-Silo Knowledge Transfer [27.63746419563747]
Large language models (LLMs) are empowered with broad knowledge, but their task-specific performance is often suboptimal. It necessitates fine-tuning LLMs with task-specific data, but such data may be inaccessible due to privacy concerns. We propose a novel approach to enhance LLMs with smaller language models (SLMs) that are trained on clients using their private task-specific data.
arXiv Detail & Related papers (2023-12-10T09:52:32Z)
A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends [30.774685501251817]
General large language models (LLMs) have demonstrated significant potential in tasks such as code generation in software engineering. A considerable portion of Code LLMs is derived from general LLMs through model fine-tuning. There is currently a lack of systematic investigation into Code LLMs and their performance.
arXiv Detail & Related papers (2023-11-17T07:55:16Z)
More Samples or More Prompts? Exploring Effective In-Context Sampling for LLM Few-Shot Prompt Engineering [35.086135550672864]
We propose In-Context Sampling (ICS) to produce confident predictions by optimizing the construction of multiple ICL prompt inputs. An in-depth evaluation with three data similarity-based ICS strategies suggests that these strategies can further elevate LLM's performance.
arXiv Detail & Related papers (2023-11-16T11:02:49Z)
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution. We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios. We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z)
Low-code LLM: Graphical User Interface over Large Language Models [115.08718239772107]
This paper introduces a novel human-LLM interaction framework, Low-code LLM. It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses. We highlight three advantages of the low-code LLM: user-friendly interaction, controllable generation, and wide applicability.
arXiv Detail & Related papers (2023-04-17T09:27:40Z)
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.