Related papers: Evaluating and Improving ChatGPT-Based Expansion of Abbreviations

Evaluating and Improving ChatGPT-Based Expansion of Abbreviations

URL: http://arxiv.org/abs/2410.23866v1
Date: Thu, 31 Oct 2024 12:20:24 GMT
Title: Evaluating and Improving ChatGPT-Based Expansion of Abbreviations
Authors: Yanjie Jiang, Hui Liu, Lu Zhang,
Abstract summary: We present the first empirical study on large language models (LLMs)-based abbreviation expansion. Our evaluation results suggest that ChatGPT is substantially less accurate than the state-of-the-art approach. In response to the first cause, we investigated the effect of various contexts and found surrounding source code is the best selection.
Score: 6.900119856872516
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Source code identifiers often contain abbreviations. Such abbreviations may reduce the readability of the source code, which in turn hinders the maintenance of the software applications. To this end, accurate and automated approaches to expanding abbreviations in source code are desirable and abbreviation expansion has been intensively investigated. However, to the best of our knowledge, most existing approaches are heuristics, and none of them has even employed deep learning techniques, let alone the most advanced large language models (LLMs). LLMs have demonstrated cutting-edge performance in various software engineering tasks, and thus it has the potential to expand abbreviation automatically. To this end, in this paper, we present the first empirical study on LLM-based abbreviation expansion. Our evaluation results on a public benchmark suggest that ChatGPT is substantially less accurate than the state-of-the-art approach, reducing precision and recall by 28.2\% and 27.8\%, respectively. We manually analyzed the failed cases, and discovered the root causes for the failures: 1) Lack of contexts and 2) Inability to recognize abbreviations. In response to the first cause, we investigated the effect of various contexts and found surrounding source code is the best selection. In response to the second cause, we designed an iterative approach that identifies and explicitly marks missed abbreviations in prompts. Finally, we proposed a post-condition checking to exclude incorrect expansions that violate commonsense. All such measures together make ChatGPT-based abbreviation expansion comparable to the state of the art while avoiding expensive source code parsing and deep analysis that are indispensable for state-of-the-art approaches.

Related papers

Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs [54.309127753635366]
We present the results of a replication study in which we investigate GPT-4 effectiveness in recommending and suggesting idiomatic actions. Our findings underscore the potential of LLMs to achieve tasks where, in the past, implementing recommenders based on complex code analyses was required.
arXiv Detail & Related papers (2025-01-28T15:41:54Z)
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework. This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings. Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z)
Automated Extraction of Acronym-Expansion Pairs from Scientific Papers [0.0]
This project addresses challenges posed by the widespread use of abbreviations and acronyms in digital texts. We propose a novel method that combines document preprocessing, regular expressions, and a large language model to identify abbreviations and map them to their corresponding expansions.
arXiv Detail & Related papers (2024-12-02T04:05:49Z)
Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales. We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z)
Chain-of-Thought Reasoning Without Prompting [40.92854235219315]
CoT reasoning paths can be elicited from pre-trained language models by simply altering the textitdecoding process. The presence of a CoT in the decoding path correlates with a higher confidence in the model's decoded answer.
arXiv Detail & Related papers (2024-02-15T18:55:41Z)
BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs' Generation [60.77990074569754]
We present a computation-efficient framework that steers a frozen Pre-Trained Language Model towards more commonsensical generation. Specifically, we first construct a reference-free evaluator that assigns a sentence with a commonsensical score. We then use the scorer as the oracle for commonsense knowledge, and extend the controllable generation method called NADO to train an auxiliary head.
arXiv Detail & Related papers (2023-10-25T23:32:12Z)
Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs) Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z)
Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics. We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs. Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z)
Arguments to Key Points Mapping with Prompt-based Learning [0.0]
We propose two approaches to the argument-to-keypoint mapping task. The first approach is to incorporate prompt engineering for fine-tuning the pre-trained language models. The second approach utilizes prompt-based learning in PLMs to generate intermediary texts.
arXiv Detail & Related papers (2022-11-28T01:48:29Z)
Dealing with Abbreviations in the Slovenian Biographical Lexicon [2.0810096547938164]
Abbreviations present a significant challenge for NLP systems because they cause tokenization and out-of-vocabulary errors. We propose a new method for addressing the problems caused by a high density of domain-specific abbreviations in a text.
arXiv Detail & Related papers (2022-11-04T13:09:02Z)
Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED) Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title. In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it. We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z)
Structured abbreviation expansion in context [12.000998471674649]
We consider the task of reversing ad hoc abbreviations in context to recover normalized, expanded versions of abbreviated messages. The problem is related to, but distinct from, spelling correction, in that ad hoc abbreviations are intentional and may involve substantial differences from the original words.
arXiv Detail & Related papers (2021-10-04T01:22:43Z)
Enforcing Consistency in Weakly Supervised Semantic Parsing [68.2211621631765]
We explore the use of consistency between the output programs for related inputs to reduce the impact of spurious programs. We find that a more consistent formalism leads to improved model performance even without consistency-based training.
arXiv Detail & Related papers (2021-07-13T03:48:04Z)
Leveraging Domain Agnostic and Specific Knowledge for Acronym Disambiguation [5.766754189548904]
Acronym disambiguation aims to find the correct meaning of an ambiguous acronym in a text. We propose a Hierarchical Dual-path BERT method coined hdBERT to capture the general fine-grained and high-level specific representations. With a widely adopted SciAD dataset contained 62,441 sentences, we investigate the effectiveness of hdBERT.
arXiv Detail & Related papers (2021-07-01T09:10:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.