Reproducing, Extending, and Analyzing Naming Experiments
- URL: http://arxiv.org/abs/2402.10022v1
- Date: Thu, 15 Feb 2024 15:39:54 GMT
- Title: Reproducing, Extending, and Analyzing Naming Experiments
- Authors: Rachel Alpern, Ido Lazer, Issar Tzachor, Hanit Hakim, Sapir Weissbuch,
and Dror G. Feitelson
- Abstract summary: A recent study on how developers choose names collected the names given by different developers for the same objects.
This enabled a study of these names' diversity and structure, and the construction of a model of how names are created.
We reproduce different parts of this study in three independent experiments.
- Score: 0.23456696459191312
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Naming is very important in software development, as names are often the only
vehicle of meaning about what the code is intended to do. A recent study on how
developers choose names collected the names given by different developers for
the same objects. This enabled a study of these names' diversity and structure,
and the construction of a model of how names are created. We reproduce
different parts of this study in three independent experiments. Importantly, we
employ methodological variations rather than striving of an exact replication.
When the same results are obtained this then boosts our confidence in their
validity by demonstrating that they do not depend on the methodology.
Our results indeed corroborate those of the original study in terms of the
diversity of names, the low probability of two developers choosing the same
name, and the finding that experienced developers tend to use slightly longer
names than inexperienced students. We explain name diversity by performing a
new analysis of the names, classifying the concepts represented in them as
universal (agreed upon), alternative (reflecting divergent views on a topic),
or optional (reflecting divergent opinions on whether to include this concept
at all). This classification enables new research directions concerning the
considerations involved in naming decisions. We also show that explicitly using
the model proposed in the original study to guide naming leads to the creation
of better names, whereas the simpler approach of just asking participants to
use longer and more detailed names does not.
Related papers
- Renovating Names in Open-Vocabulary Segmentation Benchmarks [31.243790558954288]
We present a framework for "renovating" names in open-vocabulary segmentation benchmarks (RENOVATE)
Our framework features a renaming model that enhances the quality of names for each visual segment.
We show that our renovated names help train stronger open-vocabulary models with up to 15% relative improvement.
arXiv Detail & Related papers (2024-03-14T17:35:32Z) - Multicultural Name Recognition For Previously Unseen Names [65.268245109828]
This paper attempts to improve recognition of person names, a diverse category that can grow any time someone is born or changes their name.
I look at names from 103 countries to compare how well the model performs on names from different cultures.
I find that a model with combined character and word input outperforms word-only models and may improve on accuracy compared to classical NER models.
arXiv Detail & Related papers (2024-01-23T17:58:38Z) - The Impact of Familiarity on Naming Variation: A Study on Object Naming
in Mandarin Chinese [4.6112416098164255]
We create a Language and Vision dataset for Mandarin Chinese that provides an average of 20 names for 1319 naturalistic images.
We investigate how familiarity with a given kind of object relates to the degree of naming variation it triggers across subjects.
arXiv Detail & Related papers (2023-11-16T20:13:24Z) - RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename
Refactoring [57.8069006460087]
We study automatic rename on variable names, which is considered more challenging than other rename activities.
We propose RefBERT, a two-stage pre-trained framework for rename on variable names.
We show that the generated variable names of RefBERT are more accurate and meaningful than those produced by the existing method.
arXiv Detail & Related papers (2023-05-28T12:29:39Z) - Disambiguation of Company names via Deep Recurrent Networks [101.90357454833845]
We propose a Siamese LSTM Network approach to extract -- via supervised learning -- an embedding of company name strings.
We analyse how an Active Learning approach to prioritise the samples to be labelled leads to a more efficient overall learning pipeline.
arXiv Detail & Related papers (2023-03-07T15:07:57Z) - Author Name Disambiguation via Heterogeneous Network Embedding from
Structural and Semantic Perspectives [13.266320447769564]
Name ambiguity is common in academic digital libraries, such as multiple authors having the same name.
The proposed method is mainly based on representation learning for heterogeneous networks and clustering.
The semantic representation is generated using NLP tools.
arXiv Detail & Related papers (2022-12-24T11:22:34Z) - VarCLR: Variable Semantic Representation Pre-training via Contrastive
Learning [84.70916463298109]
VarCLR is a new approach for learning semantic representations of variable names.
VarCLR is an excellent fit for contrastive learning, which aims to minimize the distance between explicitly similar inputs.
We show that VarCLR enables the effective application of sophisticated, general-purpose language models like BERT.
arXiv Detail & Related papers (2021-12-05T18:40:32Z) - Novel Aficionados and Doppelg\"angers: a referential task for semantic
representations of individual entities [0.0]
We show that the semantic distinction between proper names and common nouns is reflected in their linguistic distributions.
The results indicate that the distributional representations of different individual entities are less clearly distinguishable from each other than those of common nouns.
arXiv Detail & Related papers (2021-04-20T22:24:19Z) - SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and
Synonym Discovery [66.24624547470175]
SynSetExpan is a novel framework that enables two tasks to mutually enhance each other.
We create the first large-scale Synonym-Enhanced Set Expansion dataset via crowdsourcing.
Experiments on the SE2 dataset and previous benchmarks demonstrate the effectiveness of SynSetExpan for both entity set expansion and synonym discovery tasks.
arXiv Detail & Related papers (2020-09-29T07:32:17Z) - How Does That Sound? Multi-Language SpokenName2Vec Algorithm Using
Speech Generation and Deep Learning [4.769747792846004]
SpokenName2Vec is a novel and generic approach which addresses the similar name suggestion problem.
The proposed approach was demonstrated on a large-scale dataset consisting of 250,000 forenames.
The performance of the proposed approach was found to be superior to 10 other algorithms evaluated in this study.
arXiv Detail & Related papers (2020-05-24T20:39:00Z) - Interpretability Analysis for Named Entity Recognition to Understand
System Predictions and How They Can Improve [49.878051587667244]
We examine the performance of several variants of LSTM-CRF architectures for named entity recognition.
We find that context representations do contribute to system performance, but that the main factor driving high performance is learning the name tokens themselves.
We enlist human annotators to evaluate the feasibility of inferring entity types from the context alone and find that, while people are not able to infer the entity type either for the majority of the errors made by the context-only system, there is some room for improvement.
arXiv Detail & Related papers (2020-04-09T14:37:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.