Identifier Name Similarities: An Exploratory Study
- URL: http://arxiv.org/abs/2507.18081v1
- Date: Thu, 24 Jul 2025 04:13:26 GMT
- Title: Identifier Name Similarities: An Exploratory Study
- Authors: Carol Wong, Mai Abe, Silvia De Benedictis, Marissa Halim, Anthony Peruma,
- Abstract summary: We present our preliminary findings on the occurrence of identifier name similarity in software projects.<n>We envision our initial taxonomy providing researchers with a platform to analyze and evaluate the impact of identifier name similarity on code comprehension, maintainability, and collaboration among developers.
- Score: 3.7420775485568294
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Identifier names, which comprise a significant portion of the codebase, are the cornerstone of effective program comprehension. However, research has shown that poorly chosen names can significantly increase cognitive load and hinder collaboration. Even names that appear readable in isolation may lead to misunderstandings in contexts when they closely resemble other names in either structure or functionality. In this exploratory study, we present our preliminary findings on the occurrence of identifier name similarity in software projects through the development of a taxonomy that categorizes different forms of identifier name similarity. We envision our initial taxonomy providing researchers with a platform to analyze and evaluate the impact of identifier name similarity on code comprehension, maintainability, and collaboration among developers, while also allowing for further refinement and expansion of the taxonomy.
Related papers
- On the Structure and Semantics of Identifier Names Containing Closed Syntactic Category Words [19.94735883254009]
This paper investigates the linguistic structure of identifier names by extending the concept of grammar patterns.<n>The specific focus is on closed syntactic categories, which are rarely studied in software engineering.<n>The relationship between closed-category grammar patterns and program behavior is then analyzed using grounded-theory-inspired coding, statistical, and pattern analysis.
arXiv Detail & Related papers (2025-05-24T00:58:50Z) - Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$)
GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training.
Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z) - RENAS: Prioritizing Co-Renaming Opportunities of Identifiers [1.1688548469063846]
This study introduces a technique called RENAS, which identifies and recommends related identifiers that should be renamed simultaneously in Java applications.
ReNAS determines priority scores for renaming candidates based on the relationships and similarities among identifiers.
ReNAS demonstrated an improvement in the F1-measure by more than 0.11 compared with existing renaming recommendation approaches.
arXiv Detail & Related papers (2024-08-19T06:13:09Z) - Reproducing, Extending, and Analyzing Naming Experiments [0.23456696459191312]
A recent study on how developers choose names collected the names given by different developers for the same objects.
This enabled a study of these names' diversity and structure, and the construction of a model of how names are created.
We reproduce different parts of this study in three independent experiments.
arXiv Detail & Related papers (2024-02-15T15:39:54Z) - Multicultural Name Recognition For Previously Unseen Names [65.268245109828]
This paper attempts to improve recognition of person names, a diverse category that can grow any time someone is born or changes their name.
I look at names from 103 countries to compare how well the model performs on names from different cultures.
I find that a model with combined character and word input outperforms word-only models and may improve on accuracy compared to classical NER models.
arXiv Detail & Related papers (2024-01-23T17:58:38Z) - Multiview Identifiers Enhanced Generative Retrieval [78.38443356800848]
generative retrieval generates identifier strings of passages as the retrieval target.
We propose a new type of identifier, synthetic identifiers, that are generated based on the content of a passage.
Our proposed approach performs the best in generative retrieval, demonstrating its effectiveness and robustness.
arXiv Detail & Related papers (2023-05-26T06:50:21Z) - Disambiguation of Company names via Deep Recurrent Networks [101.90357454833845]
We propose a Siamese LSTM Network approach to extract -- via supervised learning -- an embedding of company name strings.
We analyse how an Active Learning approach to prioritise the samples to be labelled leads to a more efficient overall learning pipeline.
arXiv Detail & Related papers (2023-03-07T15:07:57Z) - Pairwise Learning for Name Disambiguation in Large-Scale Heterogeneous
Academic Networks [81.00481125272098]
We introduce Multi-view Attention-based Pairwise Recurrent Neural Network (MA-PairRNN) to solve the name disambiguation problem.
MA-PairRNN combines heterogeneous graph embedding learning and pairwise similarity learning into a framework.
Results on two real-world datasets demonstrate that our framework has a significant and consistent improvement of performance on the name disambiguation task.
arXiv Detail & Related papers (2020-08-30T06:08:20Z) - OCoR: An Overlapping-Aware Code Retriever [15.531119719750807]
Given a natural language description, code retrieval aims to search for the most relevant code among a set of code.
Existing state-of-the-art approaches apply neural networks to code retrieval.
We propose a novel neural architecture named OCoR, where we introduce two specifically-designed components to capture overlaps.
arXiv Detail & Related papers (2020-08-12T09:43:35Z) - Interpretability Analysis for Named Entity Recognition to Understand
System Predictions and How They Can Improve [49.878051587667244]
We examine the performance of several variants of LSTM-CRF architectures for named entity recognition.
We find that context representations do contribute to system performance, but that the main factor driving high performance is learning the name tokens themselves.
We enlist human annotators to evaluate the feasibility of inferring entity types from the context alone and find that, while people are not able to infer the entity type either for the majority of the errors made by the context-only system, there is some room for improvement.
arXiv Detail & Related papers (2020-04-09T14:37:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.