A Systematic Literature Review on the Impact of Formatting Elements on
Code Legibility
- URL: http://arxiv.org/abs/2208.12141v3
- Date: Thu, 1 Jun 2023 12:02:22 GMT
- Title: A Systematic Literature Review on the Impact of Formatting Elements on
Code Legibility
- Authors: Delano Oliveira, Reydne Santos, Fernanda Madeiral, Hidehiko Masuhara,
Fernando Castor
- Abstract summary: We conducted a systematic literature review and identified 15 papers containing human-centric studies.
For camel style, we found divergent results, where one study found a significant difference in favor of case, while another study found a positive result in favor of snake case.
- Score: 80.60259721973748
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Context: Software programs can be written in different but functionally
equivalent ways. Even though previous research has compared specific formatting
elements to find out which alternatives affect code legibility, seeing the
bigger picture of what makes code more or less legible is challenging. Goal: We
aim to find which formatting elements have been investigated in empirical
studies and which alternatives were found to be more legible for human
subjects. Method: We conducted a systematic literature review and identified 15
papers containing human-centric studies that directly compared alternative
formatting elements. We analyzed and organized these formatting elements using
a card-sorting method. Results: We identified 13 formatting elements (e.g.,
indentation) and 33 levels of formatting elements (e.g., two-space
indentation), which are about formatting styles, spacing, block delimiters,
long or complex code lines, and word boundary styles. While some levels were
found to be statistically better than other equivalent ones in terms of code
legibility, e.g., appropriate use of indentation with blocks, others were not,
e.g., formatting layout. For identifier style, we found divergent results,
where one study found a significant difference in favor of camel case, while
another study found a positive result in favor of snake case. Conclusion: The
number of identified papers, some of which are outdated, and the many null and
contradictory results emphasize the relative lack of work in this area and
underline the importance of more research. There is much to be understood about
how formatting elements influence code legibility before the creation of
guidelines and automated aids to help developers make their code more legible.
Related papers
- Strategies for Span Labeling with Large Language Models [0.19116784879310025]
Large language models (LLMs) are increasingly used for text analysis tasks, such as named entity recognition or error detection.<n>Unlike encoder-based models, generative architectures lack an explicit mechanism to refer to specific parts of their input.<n>In this paper, we categorize these strategies into three families: tagging the input text, indexing numerical positions of spans, and matching span content.<n>To address the limitations of content matching, we introduce LogitMatch, a new constrained decoding method that forces the model's output to align with valid input spans.
arXiv Detail & Related papers (2026-01-23T18:03:10Z) - The Medium Is Not the Message: Deconfounding Text Embeddings via Linear Concept Erasure [91.01653854955286]
Embedding-based similarity metrics can be influenced by spurious attributes like the text's source or language.<n>This paper shows that a debiasing algorithm that removes information about observed confounders from the encoder representations substantially reduces these biases at a minimal computational cost.
arXiv Detail & Related papers (2025-07-01T23:17:12Z) - Innamark: A Whitespace Replacement Information-Hiding Method [0.0]
We introduce a novel method for information hiding called Innamark.
Innamark can conceal any byte-encoded sequence within a sufficiently long cover text.
We propose a specified structure for secret messages that enables compression, encryption, hashing, and error correction.
arXiv Detail & Related papers (2025-02-18T10:21:27Z) - Studying and Recommending Information Highlighting in Stack Overflow Answers [47.98908661334215]
We studied 31,169,429 answers of Stack Overflow.
For training recommendation models, we choose CNN-based and BERT-based models for each type of formatting.
Our models achieve a precision ranging from 0.50 to 0.72 for different formats.
arXiv Detail & Related papers (2024-01-03T00:13:52Z) - Towards Unsupervised Recognition of Token-level Semantic Differences in
Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task.
We study three unsupervised approaches that rely on a masked language model.
Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z) - Entry Separation using a Mixed Visual and Textual Language Model:
Application to 19th century French Trade Directories [18.323615434182553]
A key challenge is to correctly segment what constitutes the basic text regions for the target database.
We propose a new pragmatic approach whose efficiency is demonstrated on 19th century French Trade Directories.
By injecting special visual tokens, coding, for instance, indentation or breaks, into the token stream of the language model used for NER purpose, we can leverage both textual and visual knowledge simultaneously.
arXiv Detail & Related papers (2023-02-17T15:30:44Z) - Precise Zero-Shot Dense Retrieval without Relevance Labels [60.457378374671656]
Hypothetical Document Embeddings(HyDE) is a zero-shot dense retrieval system.
We show that HyDE significantly outperforms the state-of-the-art unsupervised dense retriever Contriever.
arXiv Detail & Related papers (2022-12-20T18:09:52Z) - Searching for Discriminative Words in Multidimensional Continuous
Feature Space [0.0]
We propose a novel method to extract discriminative keywords from documents.
We show how different discriminative metrics influence the overall results.
We conclude that word feature vectors can substantially improve the topical inference of documents' meaning.
arXiv Detail & Related papers (2022-11-26T18:05:11Z) - Stylistic Fingerprints, POS-tags and Inflected Languages: A Case Study
in Polish [0.0]
Inflected languages make word forms sparse, making most statistical procedures complicated.
This paper examines the usefulness of grammatical features (as assessed via POS-tag n-grams) and lemmatized forms in recognizing author stylial profiles.
arXiv Detail & Related papers (2022-06-05T15:48:16Z) - Neural Graph Matching for Modification Similarity Applied to Electronic
Document Comparison [0.0]
Document comparison is a common task in the legal and financial industries.
In this paper, we present a novel neural graph matching approach applied to document comparison.
arXiv Detail & Related papers (2022-04-12T02:37:54Z) - Comprehensive Studies for Arbitrary-shape Scene Text Detection [78.50639779134944]
We propose a unified framework for the bottom-up based scene text detection methods.
Under the unified framework, we ensure the consistent settings for non-core modules.
With the comprehensive investigations and elaborate analyses, it reveals the advantages and disadvantages of previous models.
arXiv Detail & Related papers (2021-07-25T13:18:55Z) - Scene Text Detection with Scribble Lines [59.698806258671105]
We propose to annotate texts by scribble lines instead of polygons for text detection.
It is a general labeling method for texts with various shapes and requires low labeling costs.
Experiments show that the proposed method bridges the performance gap between the weakly labeling method and the original polygon-based labeling methods.
arXiv Detail & Related papers (2020-12-09T13:14:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.