Related papers: Adapting Installation Instructions in Rapidly Evolving Software Ecosystems

Adapting Installation Instructions in Rapidly Evolving Software Ecosystems

URL: http://arxiv.org/abs/2312.03250v3
Date: Tue, 07 Jan 2025 09:20:45 GMT
Title: Adapting Installation Instructions in Rapidly Evolving Software Ecosystems
Authors: Haoyu Gao, Christoph Treude, Mansooreh Zahedi,
Abstract summary: We conducted a study investigating GitHub repositories with 1,163 commits that focused on updates in installation-related sections.<n>Our research revealed six major categories of changes in the commits, namely pre-installation instructions, installation instructions post-installation instructions, document presentation, and external resource management.<n>We propose a template to cover installation-related sections for documentation maintainers to reference when updating documents.
Score: 9.982895603207993
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: README files play an important role in providing installation-related instructions to software users and are widely used in open source software systems on platforms such as GitHub. However, these files often suffer from various documentation issues, leading to challenges in comprehension and potential errors in content. Despite their significance, there is a lack of systematic understanding regarding the documentation efforts invested in README files, especially in the context of installation-related instructions, which are crucial for users to start with a software project. To fill the research gap, we conducted a qualitative study, investigating 400 GitHub repositories with 1,163 README commits that focused on updates in installation-related sections. Our research revealed six major categories of changes in the README commits, namely pre-installation instructions, installation instructions, post-installation instructions, help information updates, document presentation, and external resource management. We further provide detailed insights into modification behaviours and offer examples of these updates. Based on our findings, we propose a README template tailored to cover the installation-related sections for documentation maintainers to reference when updating documents. We further validate this template by conducting an online survey, identifying that documentation readers find the augmented documents based on our template are generally of better quality. We further provide recommendations to practitioners for maintaining their README files, as well as motivations for future research directions... (too long for arxiv)

Related papers

QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding [53.69841526266547]
Fine-tuning a pre-trained Vision-Language Model with new datasets often falls short in optimizing the vision encoder. We introduce QID, a novel, streamlined, architecture-preserving approach that integrates query embeddings into the vision encoder.
arXiv Detail & Related papers (2025-04-03T18:47:16Z)
The Introduction of README and CONTRIBUTING Files in Open Source Software Development [1.5024443617567174]
CONTRIBUTING files can serve as the first point of contact for potential contributors to free/libre and open source software (FLOSS) projects. Prominent open source software organizations such as Mozilla, GitHub, and the Linux Foundation advocate that projects provide community-focused and process-oriented documentation early to foster recruitment and activity.
arXiv Detail & Related papers (2025-02-25T18:33:52Z)
Prompting in the Wild: An Empirical Study of Prompt Evolution in Software Repositories [11.06441376653589]
This study presents the first empirical analysis of prompt evolution in LLM-integrated software development. We analyzed 1,262 prompt changes across 243 GitHub repositories to investigate the patterns and frequencies of prompt changes. Our findings show that developers primarily evolve prompts through additions and modifications, with most changes occurring during feature development.
arXiv Detail & Related papers (2024-12-23T05:41:01Z)
Supporting Software Maintenance with Dynamically Generated Document Hierarchies [41.407915858583344]
We present HGEN, a fully automated pipeline that transforms source code through a series of six stages into a well-organized hierarchy of formatted documents. We evaluate HGEN both quantitatively and qualitatively. Results show that HGEN produces artifact hierarchies similar in quality to manually constructed documentation, with much higher coverage of the core concepts than the baseline approach.
arXiv Detail & Related papers (2024-08-11T17:11:14Z)
DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems [99.17123445211115]
We introduce DocBench, a benchmark to evaluate large language model (LLM)-based document reading systems. Our benchmark involves the recruitment of human annotators and the generation of synthetic questions. It includes 229 real documents and 1,102 questions, spanning across five different domains and four major types of questions.
arXiv Detail & Related papers (2024-07-15T13:17:42Z)
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding [21.916774808384893]
The proposed layout instruction tuning strategy consists of two components: layout-aware Pre-training and layout-aware Supervised Finetuning. Experiments on standard benchmarks show that the proposed LayoutLLM significantly outperforms existing methods that adopt open-source 7B LLMs/MLLMs for document understanding.
arXiv Detail & Related papers (2024-04-08T06:40:28Z)
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions [71.5977045423177]
We study the use of instructions in Information Retrieval systems. We introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark. We show that it is possible for IR models to learn to follow complex instructions.
arXiv Detail & Related papers (2024-03-22T14:42:29Z)
Beyond the Chat: Executable and Verifiable Text-Editing with LLMs [87.84199761550634]
Conversational interfaces powered by Large Language Models (LLMs) have recently become a popular way to obtain feedback during document editing. We present InkSync, an editing interface that suggests executable edits directly within the document being edited.
arXiv Detail & Related papers (2023-09-27T00:56:17Z)
Evaluating Transfer Learning for Simplifying GitHub READMEs [11.219774223416648]
This study explores the potential of text simplification techniques in the domain of software engineering to automatically simplify GitHub files. We collected software-related pairs of GitHub files consisting of 14,588 entries, aligned difficult sentences with their simplified counterparts, and trained a Transformer-based model to automatically simplify difficult versions. Using automated BLEU scores and human evaluation, we compared the performance of different transfer learning schemes and the baseline models without transfer learning.
arXiv Detail & Related papers (2023-08-19T08:20:41Z)
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding [55.4806974284156]
Document understanding refers to automatically extract, analyze and comprehend information from digital documents, such as a web page. Existing Multi-model Large Language Models (MLLMs) have demonstrated promising zero-shot capabilities in shallow OCR-free text recognition.
arXiv Detail & Related papers (2023-07-04T11:28:07Z)
Envisioning the Next-Gen Document Reader [41.35737889497044]
We present our vision for the next-gen document reader that strives to enhance user understanding and create a more connected, trustworthy information experience. We describe 18 NLP-powered features to add to existing document readers and propose a novel plug-in marketplace that allows users to further customize their reading experience.
arXiv Detail & Related papers (2023-02-15T06:43:12Z)
Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents. LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents. Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z)
FRUIT: Faithfully Reflecting Updated Information in Text [106.40177769765512]
We introduce the novel generation task of *faithfully reflecting updated information in text*(FRUIT) Our analysis shows that developing models that can update articles faithfully requires new capabilities for neural generation models.
arXiv Detail & Related papers (2021-12-16T05:21:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.