OpenLambdaVerse: A Dataset and Analysis of Open-Source Serverless Applications
- URL: http://arxiv.org/abs/2508.01492v1
- Date: Sat, 02 Aug 2025 21:30:01 GMT
- Title: OpenLambdaVerse: A Dataset and Analysis of Open-Source Serverless Applications
- Authors: Angel C. Chavez-Moreno, Cristina L. Abad,
- Abstract summary: OpenLambdaVerse is a dataset of GitHub repositories that use the Serverless Framework in applications that contain one or more Lambda functions.<n>We gain important insights on the size and complexity of current applications, which languages and languages they employ, how are the functions triggered, the maturity of the projects, and their security practices.
- Score: 0.6215404942415159
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Function-as-a-Service (FaaS) is at the core of serverless computing, enabling developers to easily deploy applications without managing computing resources. With an Infrastructure-as-Code (IaC) approach, frameworks like the Serverless Framework use YAML configurations to define and deploy APIs, tasks, workflows, and event-driven applications on cloud providers, promoting zero-friction development. As with any rapidly evolving ecosystem, there is a need for updated insights into how these tools are used in real-world projects. Building on the methodology established by the Wonderless dataset for serverless computing (and applying multiple new filtering steps), OpenLambdaVerse addresses this gap by creating a dataset of current GitHub repositories that use the Serverless Framework in applications that contain one or more AWS Lambda functions. We then analyze and characterize this dataset to get an understanding of the state-of-the-art in serverless architectures based on this stack. Through this analysis we gain important insights on the size and complexity of current applications, which languages and runtimes they employ, how are the functions triggered, the maturity of the projects, and their security practices (or lack of). OpenLambdaVerse thus offers a valuable, up-to-date resource for both practitioners and researchers that seek to better understand evolving serverless workloads.
Related papers
- ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks [54.52092001110694]
ThinkGeo is a benchmark designed to evaluate tool-augmented agents on remote sensing tasks via structured tool use and multi-step planning.<n>Inspired by tool-interaction paradigms, ThinkGeo includes human-curated queries spanning a wide range of real-world applications.<n>Our analysis reveals notable disparities in tool accuracy and planning consistency across models.
arXiv Detail & Related papers (2025-05-29T17:59:38Z) - LLM-Generated Microservice Implementations from RESTful API Definitions [3.740584607001637]
This paper presents a system that uses Large Language Models (LLMs) to automate the API-first development of software.<n>The system generates OpenAPI specification, generating server code from it, and refining the code through a feedback loop that analyzes execution logs and error messages.<n>The system has the potential to benefit software developers, architects, and organizations to speed up software development cycles.
arXiv Detail & Related papers (2025-02-13T20:50:33Z) - LLMs for Generation of Architectural Components: An Exploratory Empirical Study in the Serverless World [0.0]
This paper studies the capability of Large Language Models to generate architectural components for Functions as a Service (F)<n>The small size of their architectural components make this architectural style amenable for generation using current LLMs.<n>We evaluate correctness through existing tests present in the repositories and use metrics from the Software Engineering (SE) and Natural Language Processing (NLP) domains.
arXiv Detail & Related papers (2025-02-04T18:06:04Z) - The Compressor-Retriever Architecture for Language Model OS [20.56093501980724]
This paper explores the concept of using a language model as the core component of an operating system (OS)
A key challenge in realizing such an LM OS is managing the life-long context and ensuring statefulness across sessions.
We introduce compressor-retriever, a model-agnostic architecture designed for life-long context management.
arXiv Detail & Related papers (2024-09-02T23:28:15Z) - CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases [13.733229886643041]
Large Language Models (LLMs) excel in stand-alone code tasks like HumanEval and MBPP, but struggle with handling entire code repositories.
Similarity-based retrieval often has low recall in complex tasks, while manual tools and APIs are typically task-specific and require expert knowledge.
We introduce CodexGraph, a system that integrates LLM agents with graph database interfaces extracted from code repositories.
arXiv Detail & Related papers (2024-08-07T17:13:59Z) - Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub [79.31134731122462]
We introduce OpenAct benchmark to evaluate the open-domain task-solving capability, built on human expert consultation and repositories in GitHub.<n>We present OpenAgent, a novel LLM-based agent system that can tackle evolving queries in open domains through autonomously integrating specialized tools from GitHub.
arXiv Detail & Related papers (2023-12-28T15:47:30Z) - TaskWeaver: A Code-First Agent Framework [50.99683051759488]
TaskWeaver is a code-first framework for building LLM-powered autonomous agents.
It converts user requests into executable code and treats user-defined plugins as callable functions.
It provides support for rich data structures, flexible plugin usage, and dynamic plugin selection.
arXiv Detail & Related papers (2023-11-29T11:23:42Z) - CRAFT: Customizing LLMs by Creating and Retrieving from Specialized
Toolsets [75.64181719386497]
We present CRAFT, a tool creation and retrieval framework for large language models (LLMs)
It creates toolsets specifically curated for the tasks and equips LLMs with a component that retrieves tools from these sets to enhance their capability to solve complex tasks.
Our method is designed to be flexible and offers a plug-and-play approach to adapt off-the-shelf LLMs to unseen domains and modalities, without any finetuning.
arXiv Detail & Related papers (2023-09-29T17:40:26Z) - ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world
APIs [104.37772295581088]
Open-source large language models (LLMs), e.g., LLaMA, remain significantly limited in tool-use capabilities.
We introduce ToolLLM, a general tool-usetuning encompassing data construction, model training, and evaluation.
We first present ToolBench, an instruction-tuning framework for tool use, which is constructed automatically using ChatGPT.
arXiv Detail & Related papers (2023-07-31T15:56:53Z) - Exploring the potential of flow-based programming for machine learning
deployment in comparison with service-oriented architectures [8.677012233188968]
We argue that part of the reason is infrastructure that was not designed for activities around data collection and analysis.
We propose to consider flow-based programming with data streams as an alternative to commonly used service-oriented architectures for building software applications.
arXiv Detail & Related papers (2021-08-09T15:06:02Z) - Large-Scale Intelligent Microservices [24.99695289157708]
We introduce an Apache Spark-based micro-service orchestration framework that extends database operations to include web service primitives.
We provide large scale clients for intelligent services such as speech, vision, search, anomaly detection, and text analysis.
arXiv Detail & Related papers (2020-09-17T03:38:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.