Related papers: Deep Learning-Based Identification of Inconsistent Method Names: How Far Are We?

Deep Learning-Based Identification of Inconsistent Method Names: How Far Are We?

URL: http://arxiv.org/abs/2501.12617v1
Date: Wed, 22 Jan 2025 03:51:56 GMT
Title: Deep Learning-Based Identification of Inconsistent Method Names: How Far Are We?
Authors: Taiming Wang, Yuxia Zhang, Lin Jiang, Yi Tang, Guangjie Li, Hui Liu,
Abstract summary: We present an empirical study that evaluates state-of-the-art deep learning methods for identifying inconsistent method names.<n>We create a new benchmark by combining automatic identification from commit histories and manual developer inspections.<n>Our results show that performance drops substantially when moving from the balanced dataset to the new benchmark.
Score: 6.199085977656376
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Concise and meaningful method names are crucial for program comprehension and maintenance. However, method names may become inconsistent with their corresponding implementations, causing confusion and errors. Several deep learning (DL)-based approaches have been proposed to identify such inconsistencies, with initial evaluations showing promising results. However, these evaluations typically use a balanced dataset, where the number of inconsistent and consistent names are equal. This setup, along with flawed dataset construction, leads to false positives, making reported performance less reliable in real-world scenarios, where most method names are consistent. In this paper, we present an empirical study that evaluates state-of-the-art DL-based methods for identifying inconsistent method names. We create a new benchmark by combining automatic identification from commit histories and manual developer inspections, reducing false positives. We evaluate five representative DL approaches (one retrieval-based and four generation-based) on this benchmark. Our results show that performance drops substantially when moving from the balanced dataset to the new benchmark. We further conduct quantitative and qualitative analyses to understand the strengths and weaknesses of the approaches. Retrieval-based methods perform well on simple methods and those with popular name sub-tokens but fail due to inefficient representation techniques. Generation-based methods struggle with inaccurate similarity calculations and immature name generation. Based on these findings, we propose improvements using contrastive learning and large language models (LLMs). Our study suggests that significant improvements are needed before these DL approaches can be effectively applied to real-world software systems.

Related papers

Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation [6.4212082894269535]
We compare existing leakage detection techniques, namely permutation and n-gram-based methods.<n>Our analysis shows that the n-gram method consistently achieves the highest F1-score.<n>We create cleaned versions of MMLU and HellaSwag, and re-evaluate several LLMs.
arXiv Detail & Related papers (2025-05-30T06:37:39Z)
Examining False Positives under Inference Scaling for Mathematical Reasoning [59.19191774050967]
This paper systematically examines the prevalence of false positive solutions in mathematical problem solving for language models. We explore how false positives influence the inference time scaling behavior of language models.
arXiv Detail & Related papers (2025-02-10T07:49:35Z)
Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning [81.83013974171364]
Semi-supervised multi-label learning (SSMLL) is a powerful framework for leveraging unlabeled data to reduce the expensive cost of collecting precise multi-label annotations.<n>Unlike semi-supervised learning, one cannot select the most probable label as the pseudo-label in SSMLL due to multiple semantics contained in an instance.<n>We propose a dual-perspective method to generate high-quality pseudo-labels.
arXiv Detail & Related papers (2024-07-26T09:33:53Z)
How Reliable Are Automatic Evaluation Methods for Instruction-Tuned LLMs? [3.1706553206969925]
We perform a meta-evaluation of such methods and assess their reliability across a broad range of tasks. We observe that while automatic evaluation methods can approximate human ratings under specific conditions, their validity is highly context-dependent. Our findings enhance the understanding of how automatic methods should be applied and interpreted when developing and evaluating instruction-tuned LLMs.
arXiv Detail & Related papers (2024-02-16T15:48:33Z)
How are We Detecting Inconsistent Method Names? An Empirical Study from Code Review Perspective [13.585460827586926]
Proper naming of methods can make program code easier to understand, and thus enhance software maintainability. Much research effort has been invested into building automatic tools that can check for method name inconsistency. We present an empirical study on how state-of-the-art techniques perform in detecting or recommending consistent and inconsistent method names.
arXiv Detail & Related papers (2023-08-24T10:39:18Z)
Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data. This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z)
Meta Objective Guided Disambiguation for Partial Label Learning [44.05801303440139]
We propose a novel framework for partial label learning with meta objective guided disambiguation (MoGD) MoGD aims to recover the ground-truth label from candidate labels set by solving a meta objective on a small validation set. The proposed method can be easily implemented by using various deep networks with the ordinary SGD.
arXiv Detail & Related papers (2022-08-26T06:48:01Z)
Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
Take More Positives: An Empirical Study of Contrastive Learing in Unsupervised Person Re-Identification [43.11532800327356]
Unsupervised person re-ID aims at closing the performance gap to supervised methods. We show that the reason why they are successful is not only their label generation mechanisms, but also their unexplored designs. We propose a contrastive learning method without a memory back for unsupervised person re-ID.
arXiv Detail & Related papers (2021-01-12T08:06:11Z)
CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON) First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.