Related papers: A Systematic Literature Review on the Impact of Formatting Elements on Code Legibility

A Systematic Literature Review on the Impact of Formatting Elements on Code Legibility

URL: http://arxiv.org/abs/2208.12141v3
Date: Thu, 1 Jun 2023 12:02:22 GMT
Title: A Systematic Literature Review on the Impact of Formatting Elements on Code Legibility
Authors: Delano Oliveira, Reydne Santos, Fernanda Madeiral, Hidehiko Masuhara, Fernando Castor
Abstract summary: We conducted a systematic literature review and identified 15 papers containing human-centric studies. For camel style, we found divergent results, where one study found a significant difference in favor of case, while another study found a positive result in favor of snake case.
Score: 80.60259721973748
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Context: Software programs can be written in different but functionally equivalent ways. Even though previous research has compared specific formatting elements to find out which alternatives affect code legibility, seeing the bigger picture of what makes code more or less legible is challenging. Goal: We aim to find which formatting elements have been investigated in empirical studies and which alternatives were found to be more legible for human subjects. Method: We conducted a systematic literature review and identified 15 papers containing human-centric studies that directly compared alternative formatting elements. We analyzed and organized these formatting elements using a card-sorting method. Results: We identified 13 formatting elements (e.g., indentation) and 33 levels of formatting elements (e.g., two-space indentation), which are about formatting styles, spacing, block delimiters, long or complex code lines, and word boundary styles. While some levels were found to be statistically better than other equivalent ones in terms of code legibility, e.g., appropriate use of indentation with blocks, others were not, e.g., formatting layout. For identifier style, we found divergent results, where one study found a significant difference in favor of camel case, while another study found a positive result in favor of snake case. Conclusion: The number of identified papers, some of which are outdated, and the many null and contradictory results emphasize the relative lack of work in this area and underline the importance of more research. There is much to be understood about how formatting elements influence code legibility before the creation of guidelines and automated aids to help developers make their code more legible.

Related papers

Strategies for Span Labeling with Large Language Models [0.19116784879310025]
Large language models (LLMs) are increasingly used for text analysis tasks, such as named entity recognition or error detection.<n>Unlike encoder-based models, generative architectures lack an explicit mechanism to refer to specific parts of their input.<n>In this paper, we categorize these strategies into three families: tagging the input text, indexing numerical positions of spans, and matching span content.<n>To address the limitations of content matching, we introduce LogitMatch, a new constrained decoding method that forces the model's output to align with valid input spans.
arXiv Detail & Related papers (2026-01-23T18:03:10Z)
The Medium Is Not the Message: Deconfounding Text Embeddings via Linear Concept Erasure [91.01653854955286]
Embedding-based similarity metrics can be influenced by spurious attributes like the text's source or language.<n>This paper shows that a debiasing algorithm that removes information about observed confounders from the encoder representations substantially reduces these biases at a minimal computational cost.
arXiv Detail & Related papers (2025-07-01T23:17:12Z)
Innamark: A Whitespace Replacement Information-Hiding Method [0.0]
We introduce a novel method for information hiding called Innamark. Innamark can conceal any byte-encoded sequence within a sufficiently long cover text. We propose a specified structure for secret messages that enables compression, encryption, hashing, and error correction.
arXiv Detail & Related papers (2025-02-18T10:21:27Z)
Studying and Recommending Information Highlighting in Stack Overflow Answers [47.98908661334215]
We studied 31,169,429 answers of Stack Overflow. For training recommendation models, we choose CNN-based and BERT-based models for each type of formatting. Our models achieve a precision ranging from 0.50 to 0.72 for different formats.
arXiv Detail & Related papers (2024-01-03T00:13:52Z)
Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task. We study three unsupervised approaches that rely on a masked language model. Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z)
Entry Separation using a Mixed Visual and Textual Language Model: Application to 19th century French Trade Directories [18.323615434182553]
A key challenge is to correctly segment what constitutes the basic text regions for the target database. We propose a new pragmatic approach whose efficiency is demonstrated on 19th century French Trade Directories. By injecting special visual tokens, coding, for instance, indentation or breaks, into the token stream of the language model used for NER purpose, we can leverage both textual and visual knowledge simultaneously.
arXiv Detail & Related papers (2023-02-17T15:30:44Z)
Precise Zero-Shot Dense Retrieval without Relevance Labels [60.457378374671656]
Hypothetical Document Embeddings(HyDE) is a zero-shot dense retrieval system. We show that HyDE significantly outperforms the state-of-the-art unsupervised dense retriever Contriever.
arXiv Detail & Related papers (2022-12-20T18:09:52Z)
Searching for Discriminative Words in Multidimensional Continuous Feature Space [0.0]
We propose a novel method to extract discriminative keywords from documents. We show how different discriminative metrics influence the overall results. We conclude that word feature vectors can substantially improve the topical inference of documents' meaning.
arXiv Detail & Related papers (2022-11-26T18:05:11Z)
Stylistic Fingerprints, POS-tags and Inflected Languages: A Case Study in Polish [0.0]
Inflected languages make word forms sparse, making most statistical procedures complicated. This paper examines the usefulness of grammatical features (as assessed via POS-tag n-grams) and lemmatized forms in recognizing author stylial profiles.
arXiv Detail & Related papers (2022-06-05T15:48:16Z)
Neural Graph Matching for Modification Similarity Applied to Electronic Document Comparison [0.0]
Document comparison is a common task in the legal and financial industries. In this paper, we present a novel neural graph matching approach applied to document comparison.
arXiv Detail & Related papers (2022-04-12T02:37:54Z)
Comprehensive Studies for Arbitrary-shape Scene Text Detection [78.50639779134944]
We propose a unified framework for the bottom-up based scene text detection methods. Under the unified framework, we ensure the consistent settings for non-core modules. With the comprehensive investigations and elaborate analyses, it reveals the advantages and disadvantages of previous models.
arXiv Detail & Related papers (2021-07-25T13:18:55Z)
Scene Text Detection with Scribble Lines [59.698806258671105]
We propose to annotate texts by scribble lines instead of polygons for text detection. It is a general labeling method for texts with various shapes and requires low labeling costs. Experiments show that the proposed method bridges the performance gap between the weakly labeling method and the original polygon-based labeling methods.
arXiv Detail & Related papers (2020-12-09T13:14:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.