Related papers: Detecting and Fixing API Misuses of Data Science Libraries Using Large Language Models

Detecting and Fixing API Misuses of Data Science Libraries Using Large Language Models

URL: http://arxiv.org/abs/2509.25378v1
Date: Mon, 29 Sep 2025 18:30:02 GMT
Title: Detecting and Fixing API Misuses of Data Science Libraries Using Large Language Models
Authors: Akalanka Galappaththi, Francisco Ribeiro, Sarah Nadi,
Abstract summary: This paper introduces DSCHECKER, an LLM-based approach for detecting and fixing API misuses of data science libraries.<n>We identify two key pieces of information, API directives and data information, that may be beneficial for API misuse detection and fixing.<n>We find that Dschecker agent achieves 48.65 percent detection F1-score and fixes 39.47 percent of the misuses.
Score: 0.6958509696068848
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Data science libraries, such as scikit-learn and pandas, specialize in processing and manipulating data. The data-centric nature of these libraries makes the detection of API misuse in them more challenging. This paper introduces DSCHECKER, an LLM-based approach designed for detecting and fixing API misuses of data science libraries. We identify two key pieces of information, API directives and data information, that may be beneficial for API misuse detection and fixing. Using three LLMs and misuses from five data science libraries, we experiment with various prompts. We find that incorporating API directives and data-specific details enhances Dschecker's ability to detect and fix API misuses, with the best-performing model achieving a detection F1-score of 61.18 percent and fixing 51.28 percent of the misuses. Building on these results, we implement Dschecker agent which includes an adaptive function calling mechanism to access information on demand, simulating a real-world setting where information about the misuse is unknown in advance. We find that Dschecker agent achieves 48.65 percent detection F1-score and fixes 39.47 percent of the misuses, demonstrating the promise of LLM-based API misuse detection and fixing in real-world scenarios.

Related papers

Improving Deep Learning Library Testing with Machine Learning [40.21709249669499]
We explore using machine learning (ML) to determine input validity.<n>Shape relationships are a precise abstraction to encode concrete inputs and capture of the data.<n>We show that ML-enhanced input classification is an important aid to scale DL library testing.
arXiv Detail & Related papers (2026-02-03T17:19:01Z)
Framework-Aware Code Generation with API Knowledge Graph-Constructed Data: A Study on HarmonyOS [52.483888557864326]
APIKG4SYN is a framework designed to exploit API knowledge graphs for the construction of API-oriented question-code pairs.<n>We build the first benchmark for HarmonyOS code generation using APIKG4SYN.
arXiv Detail & Related papers (2025-11-29T08:13:54Z)
Towards Automated Error Discovery: A Study in Conversational AI [48.735443116662026]
We introduce Automated Error Discovery, a framework for detecting and defining errors in conversational AI.<n>We also propose SEEED (Soft Clustering Extended-Based Error Detection), as an encoder-based approach to its implementation.
arXiv Detail & Related papers (2025-09-13T14:53:22Z)
Enhancing the Capabilities of Large Language Models for API calls through Knowledge Graphs [1.6691048566825868]
KG2data is a system that integrates knowledge graphs, large language models (LLMs), ReAct agents, and tool-use technologies.<n>Using a virtual API, we evaluate API call accuracy across three metrics: name recognition failure, hallucination failure, and call correctness.<n> KG2data achieves superior performance (1.43%, 0%, 88.57%) compared to RAG2data (16%, 10%, 72.14%) and chat2data (7.14%, 8.57%, 71.43%)
arXiv Detail & Related papers (2025-07-14T08:20:06Z)
Hey, That's My Data! Label-Only Dataset Inference in Large Language Models [63.35066172530291]
CatShift is a label-only dataset-inference framework.<n>It capitalizes on catastrophic forgetting: the tendency of an LLM to overwrite previously learned knowledge when exposed to new data.
arXiv Detail & Related papers (2025-06-06T13:02:59Z)
LLM-assisted Mutation for Whitebox API Testing [40.91007243855959]
MioHint is a novel white-box API testing approach that leverages the code comprehension capabilities of Large Language Model (LLM) to boost API testing.<n>To evaluate the effectiveness of our method, we conducted experiments across 16 real-world API services.
arXiv Detail & Related papers (2025-04-08T07:14:51Z)
Identifying and Mitigating API Misuse in Large Language Models [26.4403427473915]
API misuse in code generated by large language models (LLMs) represents a serious emerging challenge in software development.<n>This paper presents the first comprehensive study of API misuse patterns in LLM-generated code, analyzing both method selection and parameter usage across Python and Java.<n>We propose Dr.Fix, a novel LLM-based automatic program repair approach for API misuse based on the aforementioned taxonomy.
arXiv Detail & Related papers (2025-03-28T18:43:12Z)
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models [52.439289085318634]
We show how to identify training data known to proprietary large language models (LLMs) by using information-guided probes.<n>Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes.
arXiv Detail & Related papers (2025-03-15T10:19:15Z)
ToolACE: Winning the Points of LLM Function Calling [139.07157814653638]
ToolACE is an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data.<n>We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard.
arXiv Detail & Related papers (2024-09-02T03:19:56Z)
An Empirical Study of API Misuses of Data-Centric Libraries [9.667988837321943]
This paper contributes an empirical study of API misuses of five data-centric libraries that cover areas such as data processing, numerical computation, machine learning, and visualization. We identify misuses of these libraries by analyzing data from both Stack Overflow and GitHub.
arXiv Detail & Related papers (2024-08-28T15:15:52Z)
Anomaly Detection of Tabular Data Using LLMs [54.470648484612866]
We show that pre-trained large language models (LLMs) are zero-shot batch-level anomaly detectors. We propose an end-to-end fine-tuning strategy to bring out the potential of LLMs in detecting real anomalies.
arXiv Detail & Related papers (2024-06-24T04:17:03Z)
Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings. Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.