Free and Customizable Code Documentation with LLMs: A Fine-Tuning Approach
- URL: http://arxiv.org/abs/2412.00726v1
- Date: Sun, 01 Dec 2024 08:38:18 GMT
- Title: Free and Customizable Code Documentation with LLMs: A Fine-Tuning Approach
- Authors: Sayak Chakrabarty, Souradip Pal,
- Abstract summary: We present a large language model (LLM)-based application that developers can use as a support tool to generate basic documentation for any publicly available repository.
- Score: 0.0
- License:
- Abstract: Automated documentation of programming source code is a challenging task with significant practical and scientific implications for the developer community. We present a large language model (LLM)-based application that developers can use as a support tool to generate basic documentation for any publicly available repository. Over the last decade, several papers have been written on generating documentation for source code using neural network architectures. With the recent advancements in LLM technology, some open-source applications have been developed to address this problem. However, these applications typically rely on the OpenAI APIs, which incur substantial financial costs, particularly for large repositories. Moreover, none of these open-source applications offer a fine-tuned model or features to enable users to fine-tune. Additionally, finding suitable data for fine-tuning is often challenging. Our application addresses these issues which is available at https://pypi.org/project/readme-ready/.
Related papers
- LLM-Generated Microservice Implementations from RESTful API Definitions [3.740584607001637]
This paper presents a system that uses Large Language Models (LLMs) to automate the API-first development of software.
The system generates OpenAPI specification, generating server code from it, and refining the code through a feedback loop that analyzes execution logs and error messages.
The system has the potential to benefit software developers, architects, and organizations to speed up software development cycles.
arXiv Detail & Related papers (2025-02-13T20:50:33Z) - Human-In-the-Loop Software Development Agents [12.830816751625829]
Large Language Models (LLMs)-based multi-agent paradigms for software engineering are introduced to automatically resolve software development tasks.
In this paper, we introduce a Human-in-the-loop LLM-based Agents framework (HULA) for software development.
We design, implement, and deploy the HULA framework into Atlassian for internal uses.
arXiv Detail & Related papers (2024-11-19T23:22:33Z) - OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.
While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.
We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - Free to play: UN Trade and Development's experience with developing its own open-source Retrieval Augmented Generation Large Language Model application [0.0]
UNCTAD has explored and developed its own open-source Retrieval Augmented Generation (RAG) LLM application.
RAG makes Large Language Models aware of and more useful for the organization's domain and work.
Three libraries developed to produce the app, nlp_pipeline for document processing and statistical analysis, local_rag_llm for running a local RAG LLM, and streamlit_rag for the user interface, are publicly available on PyPI and GitHub with Dockerfiles.
arXiv Detail & Related papers (2024-06-18T14:23:54Z) - VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development.
We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM)
We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z) - DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows [72.40917624485822]
We introduce DataDreamer, an open source Python library that allows researchers to implement powerful large language models.
DataDreamer also helps researchers adhere to best practices that we propose to encourage open science.
arXiv Detail & Related papers (2024-02-16T00:10:26Z) - SoTaNa: The Open-Source Software Development Assistant [81.86136560157266]
SoTaNa is an open-source software development assistant.
It generates high-quality instruction-based data for the domain of software engineering.
It employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA.
arXiv Detail & Related papers (2023-08-25T14:56:21Z) - Enhancing API Documentation through BERTopic Modeling and Summarization [0.0]
This paper focuses on the complexities of interpreting Application Programming Interface (API) documentation.
Official API documentation serves as a primary source of information for developers, but it can often be extensive and lacks user-friendliness.
Our novel approach employs the strengths of BERTopic for topic modeling and Natural Language Processing (NLP) to automatically generate summaries of API documentation.
arXiv Detail & Related papers (2023-08-17T15:57:12Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - Calculating Originality of LLM Assisted Source Code [0.0]
We propose a neural network-based tool to determine the original effort (and LLM's contribution) put by students in writing source codes.
Our tool is motivated by minimum description length measures like Kolmogorov complexity.
arXiv Detail & Related papers (2023-07-10T11:30:46Z) - CodeTF: One-stop Transformer Library for State-of-the-art Code LLM [72.1638273937025]
We present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.
Our library supports a collection of pretrained Code LLM models and popular code benchmarks.
We hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering.
arXiv Detail & Related papers (2023-05-31T05:24:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.