Related papers: A Study of Documentation for Software Architecture

A Study of Documentation for Software Architecture

URL: http://arxiv.org/abs/2305.17286v1
Date: Fri, 26 May 2023 22:14:53 GMT
Title: A Study of Documentation for Software Architecture
Authors: Neil A. Ernst and Martin P. Robillard
Abstract summary: We asked 65 participants to answer software architecture understanding questions. Answers to questions that require applying and creating activities were statistically significantly associated with the use of the system's source code. We conclude that, in the limited experimental context studied, our results contradict the hypothesis that the format of architectural documentation matters.
Score: 7.011803832284996
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Documentation is an important mechanism for disseminating software architecture knowledge. Software project teams can employ vastly different formats for documenting software architecture, from unstructured narratives to standardized documents. We explored to what extent this documentation format may matter to newcomers joining a software project and attempting to understand its architecture. We conducted a controlled questionnaire-based study wherein we asked 65 participants to answer software architecture understanding questions using one of two randomly-assigned documentation formats: narrative essays, and structured documents. We analyzed the factors associated with answer quality using a Bayesian ordered categorical regression and observed no significant association between the format of architecture documentation and performance on architecture understanding tasks. Instead, prior exposure to the source code of the system was the dominant factor associated with answer quality. We also observed that answers to questions that require applying and creating activities were statistically significantly associated with the use of the system's source code to answer the question, whereas the document format or level of familiarity with the system were not. Subjective sentiment about the documentation format was comparable: Although more participants agreed that the structured document was easier to navigate and use for writing code, this relation was not statistically significant. We conclude that, in the limited experimental context studied, our results contradict the hypothesis that the format of architectural documentation matters. We surface two more important factors related to effective use of software architecture documentation: prior familiarity with the source code, and the type of architectural information sought.

Related papers

UniFAR: A Unified Facet-Aware Retrieval Framework for Scientific Documents [35.00880322661589]
We propose a Unified Facet-Aware Retrieval framework to support doc-doc and q-doc SDR within a single architecture.<n>UniFAR reconciles document structure with question intent via learnable facet anchors, and unifies doc-doc and q-doc supervision through joint training.
arXiv Detail & Related papers (2026-02-27T07:44:02Z)
MoDora: Tree-Based Semi-Structured Document Analysis System [62.01015188258797]
Semi-structured documents integrate diverse interleaved data elements arranged in various and often irregular layouts.<n>MoDora is an LLM-powered system for semi-structured document analysis.<n> Experiments show MoDora outperforms baselines by 5.97%-61.07% in accuracy.
arXiv Detail & Related papers (2026-02-26T14:48:49Z)
A document is worth a structured record: Principled inductive bias design for document recognition [3.4332178437507936]
State-of-the-art approaches treat document recognition as a computer vision problem.<n>We suggest a novel perspective that frames document recognition as a transcription task from a document to a record.<n>This implies a natural grouping of documents based on the intrinsic structure inherent in their transcription.
arXiv Detail & Related papers (2025-07-11T10:02:08Z)
DISRetrieval: Harnessing Discourse Structure for Long Document Retrieval [51.89673002051528]
DISRetrieval is a novel hierarchical retrieval framework that leverages linguistic discourse structure to enhance long document understanding.<n>Our studies confirm that discourse structure significantly enhances retrieval effectiveness across different document lengths and query types.
arXiv Detail & Related papers (2025-05-26T14:45:12Z)
Semi-Automated Design of Data-Intensive Architectures [49.1574468325115]
This paper introduces a development methodology for data-intensive architectures. It guides architects in (i) designing a suitable architecture for their specific application scenario, and (ii) selecting an appropriate set of concrete systems to implement the application. We show that the description languages we adopt can capture the key aspects of data-intensive architectures proposed by researchers and practitioners.
arXiv Detail & Related papers (2025-03-21T16:01:11Z)
The Software Documentor Mindset [10.55519622264761]
We interviewed 26 volunteer documentation contributors, i.e. documentors, to understand why and how they create such documentation. We identified sixteen considerations that documentors have during the documentation contribution process, along three dimensions, namely motivations, topic selection techniques, and styling objectives. We propose a structure of mindsets, and their associated considerations across the three dimensions, as a framework for reasoning about the documentation contribution process.
arXiv Detail & Related papers (2024-12-12T16:28:08Z)
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction [23.47150047875133]
Document parsing is essential for converting unstructured and semi-structured documents into machine-readable data. Document parsing plays an indispensable role in both knowledge base construction and training data generation. This paper discusses the challenges faced by modular document parsing systems and vision-language models in handling complex layouts.
arXiv Detail & Related papers (2024-10-28T16:11:35Z)
Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings. First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss. Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z)
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models [63.466265039007816]
We present DocGenome, a structured document benchmark constructed by annotating 500K scientific documents from 153 disciplines in the arXiv open-access community. We conduct extensive experiments to demonstrate the advantages of DocGenome and objectively evaluate the performance of large models on our benchmark.
arXiv Detail & Related papers (2024-06-17T15:13:52Z)
Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis [9.340346869932434]
We propose a tree construction based approach that addresses multiple subtasks concurrently. We present an effective end-to-end solution based on this framework to demonstrate its performance. Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets.
arXiv Detail & Related papers (2024-01-22T12:00:37Z)
PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure. We propose PDFTriage that enables models to retrieve the context based on either structure or content. Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z)
SPM: Structured Pretraining and Matching Architectures for Relevance Modeling in Meituan Search [12.244685291395093]
In e-commerce search, relevance between query and documents is an essential requirement for satisfying user experience. We propose a novel two-stage pretraining and matching architecture for relevance matching with rich structured documents. The model has already been deployed online, serving the search traffic of Meituan for over a year.
arXiv Detail & Related papers (2023-08-15T11:45:34Z)
DLUE: Benchmarking Document Language Understanding [32.550855843975484]
There is no well-established consensus on how to comprehensively evaluate document understanding abilities. This paper summarizes four representative abilities, i.e., document classification, document structural analysis, document information extraction, and document transcription. Under the new evaluation framework, we propose textbfDocument Language Understanding Evaluation -- textbfDLUE, a new task suite.
arXiv Detail & Related papers (2023-05-16T15:16:24Z)
Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents. LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents. Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z)
Evaluation of a Region Proposal Architecture for Multi-task Document Layout Analysis [0.685316573653194]
Mask-RCNN architecture is designed to address the problem of baseline detection and region segmentation. We present experimental results on two handwritten text datasets and one handwritten music dataset. The analyzed architecture yields promising results, outperforming state-of-the-art techniques in all three datasets.
arXiv Detail & Related papers (2021-06-22T14:07:27Z)
Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level. We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z)
Explaining Relationships Between Scientific Documents [55.23390424044378]
We address the task of explaining relationships between two scientific documents using natural language text. In this paper we establish a dataset of 622K examples from 154K documents.
arXiv Detail & Related papers (2020-02-02T03:54:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.