Understanding Documentation Use Through Log Analysis: An Exploratory
Case Study of Four Cloud Services
- URL: http://arxiv.org/abs/2310.10817v2
- Date: Thu, 29 Feb 2024 23:41:27 GMT
- Title: Understanding Documentation Use Through Log Analysis: An Exploratory
Case Study of Four Cloud Services
- Authors: Daye Nam and Andrew Macvean and Brad Myers and Bogdan Vasilescu
- Abstract summary: We analyze documentation page-view logs from four cloud-based industrial services.
By analyzing page-view logs for over 100,000 users, we find diverse patterns of documentation page visits.
We propose documentation page-view log analysis as a feasible technique for design audits of documentation.
- Score: 14.104545948572836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Almost no modern software system is written from scratch, and developers are
required to effectively learn to use third-party libraries or software
services. Thus, many practitioners and researchers have looked for ways to
create effective documentation that supports developers' learning. However, few
efforts have focused on how people actually use the documentation. In this
paper, we report on an exploratory, multi-phase, mixed methods empirical study
of documentation page-view logs from four cloud-based industrial services. By
analyzing page-view logs for over 100,000 users, we find diverse patterns of
documentation page visits. Moreover, we show statistically that which
documentation pages people visit often correlates with user characteristics
such as past experience with the specific product, on the one hand, and with
future adoption of the API on the other hand. We discuss the implications of
these results on documentation design and propose documentation page-view log
analysis as a feasible technique for design audits of documentation, from ones
written for software developers to ones designed to support end users (e.g.,
Adobe Photoshop).
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models [63.466265039007816]
We present DocGenome, a structured document benchmark constructed by annotating 500K scientific documents from 153 disciplines in the arXiv open-access community.
We conduct extensive experiments to demonstrate the advantages of DocGenome and objectively evaluate the performance of large models on our benchmark.
arXiv Detail & Related papers (2024-06-17T15:13:52Z) - Does Documentation Matter? An Empirical Study of Practitioners'
Perspective on Open-Source Software Adoption [4.400274233826898]
Open-source software (OSS) has become increasingly prevalent in developing software products.
We conducted semi-structured interviews and an online survey to provide insight into this area.
We developed a topic model to collect relevant information from OSS documentation automatically.
We propose a novel information augmentation approach, DocMentor, by combining OSS documentation corpus-IDF scores and ChatGPT.
arXiv Detail & Related papers (2024-03-06T16:06:08Z) - Non Linear Software Documentation with Interactive Code Examples [9.880887106904519]
Casdoc documents are interactive resources centered around code examples for programmers.
Explanations of the code elements are presented as annotations that the readers reveal based on their needs.
We observed that interactive documents can contain more information than static documents without being distracting to readers.
arXiv Detail & Related papers (2023-11-29T20:08:46Z) - HADES: Homologous Automated Document Exploration and Summarization [3.3509104620016092]
HADES is designed to streamline the work of professionals dealing with large volumes of documents.
The tool employs a multi-step pipeline that begins with processing PDF documents using topic modeling, summarization, and analysis of the most important words for each topic.
arXiv Detail & Related papers (2023-02-25T15:16:10Z) - Learning Diverse Document Representations with Deep Query Interactions
for Dense Retrieval [79.37614949970013]
We propose a new dense retrieval model which learns diverse document representations with deep query interactions.
Our model encodes each document with a set of generated pseudo-queries to get query-informed, multi-view document representations.
arXiv Detail & Related papers (2022-08-08T16:00:55Z) - Layout-Aware Information Extraction for Document-Grounded Dialogue:
Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents.
LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents.
Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - Aspirations and Practice of Model Documentation: Moving the Needle with
Nudging and Traceability [8.875661788022637]
We propose a set of design guidelines that aim to support the documentation practice for machine learning models.
A prototype tool named DocML follows those guidelines to support model development in computational notebooks.
arXiv Detail & Related papers (2022-04-13T14:39:18Z) - Synthetic Document Generator for Annotation-free Layout Recognition [15.657295650492948]
We describe a synthetic document generator that automatically produces realistic documents with labels for spatial positions, extents and categories of layout elements.
We empirically illustrate that a deep layout detection model trained purely on the synthetic documents can match the performance of a model that uses real documents.
arXiv Detail & Related papers (2021-11-11T01:58:44Z) - DocBank: A Benchmark Dataset for Document Layout Analysis [114.81155155508083]
We present textbfDocBank, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis.
Experiment results show that models trained on DocBank accurately recognize the layout information for a variety of documents.
arXiv Detail & Related papers (2020-06-01T16:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.