Related papers: Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code

Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code

URL: http://arxiv.org/abs/2412.02764v1
Date: Tue, 03 Dec 2024 19:05:37 GMT
Title: Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code
Authors: Timur Galimzyanov, Sergey Titov, Yaroslav Golubev, Egor Bogomolov,
Abstract summary: This paper introduces the human-curated PandasPlotBench dataset. It is designed to evaluate language models' effectiveness as assistants in visual data exploration.
Score: 1.5999407512883512
License:
Abstract: This paper introduces the human-curated PandasPlotBench dataset, designed to evaluate language models' effectiveness as assistants in visual data exploration. Our benchmark focuses on generating code for visualizing tabular data - such as a Pandas DataFrame - based on natural language instructions, complementing current evaluation tools and expanding their scope. The dataset includes 175 unique tasks. Our experiments assess several leading Large Language Models (LLMs) across three visualization libraries: Matplotlib, Seaborn, and Plotly. We show that the shortening of tasks has a minimal effect on plotting capabilities, allowing for the user interface that accommodates concise user input without sacrificing functionality or accuracy. Another of our findings reveals that while LLMs perform well with popular libraries like Matplotlib and Seaborn, challenges persist with Plotly, highlighting areas for improvement. We hope that the modular design of our benchmark will broaden the current studies on generating visualizations. Our benchmark is available online: https://huggingface.co/datasets/JetBrains-Research/plot_bench. The code for running the benchmark is also available: https://github.com/JetBrains-Research/PandasPlotBench.

Related papers

Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models [90.98855064914379]
We introduce ProGraph, a benchmark for large language models (LLMs) to process graphs. Our findings reveal that the performance of current LLMs is unsatisfactory, with the best model achieving only 36% accuracy. We propose LLM4Graph datasets, which include crawled documents and auto-generated codes based on 6 widely used graph libraries.
arXiv Detail & Related papers (2024-09-29T11:38:45Z)
Comgra: A Tool for Analyzing and Debugging Neural Networks [35.89730807984949]
We introduce comgra, an open source python library for use with PyTorch. Comgra extracts data about the internal activations of a model and organizes it in a GUI. It can show both summary statistics and individual data points, compare early and late stages of training, focus on individual samples of interest, and visualize the flow of the gradient through the network.
arXiv Detail & Related papers (2024-07-31T14:57:23Z)
PyBench: Evaluating LLM Agent on various real-world coding tasks [13.347173063163138]
PyBench is a benchmark covering five main categories of real-world tasks, covering more than 10 types of files. Our evaluations indicate that current open-source LLMs are struggling with these tasks. Our fine-tuned 8B size model: textbfPyLlama3 achieves an exciting performance on PyBench.
arXiv Detail & Related papers (2024-07-23T15:23:14Z)
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots [66.95139377783966]
We introduce Plot2Code, a comprehensive visual coding benchmark for Multi-modal Large Language Models. We collect 132 manually selected high-quality matplotlib plots across six plot types from publicly available matplotlib galleries. For each plot, we carefully offer its source code, and an descriptive instruction summarized by GPT-4.
arXiv Detail & Related papers (2024-05-13T17:59:22Z)
SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval [92.27387459751309]
We provide SPRINT, a unified Python toolkit for evaluating neural sparse retrieval. We establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document.
arXiv Detail & Related papers (2023-07-19T22:48:02Z)
Plotly-Resampler: Effective Visual Analytics for Large Time Series [1.0756377625425109]
Plotly-Resampler is an add-on for Plotly's Python bindings, enhancing line chart scalability on top of an interactive toolkit. Plotly-Resampler's flexible data aggregation functionality paves the path towards researching novel aggregation techniques.
arXiv Detail & Related papers (2022-06-17T16:12:55Z)
PyGOD: A Python Library for Graph Outlier Detection [56.33769221859135]
PyGOD is an open-source library for detecting outliers in graph data. It supports a wide array of leading graph-based methods for outlier detection. PyGOD is released under a BSD 2-Clause license at https://pygod.org and at the Python Package Index (PyPI)
arXiv Detail & Related papers (2022-04-26T06:15:21Z)
Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python [77.33905890197269]
We describe a new library which implements a unified pathwise coordinate optimization for a variety of sparse learning problems. The library is coded in R++ and has user-friendly sparse experiments.
arXiv Detail & Related papers (2020-06-27T02:39:24Z)
Little Ball of Fur: A Python Library for Graph Sampling [8.089234432461804]
Little Ball of Fur is a Python library that includes more than twenty graph sampling algorithms. We show the practical usability of the library by estimating various global statistics of social networks and web graphs.
arXiv Detail & Related papers (2020-06-08T01:35:24Z)
Open Graph Benchmark: Datasets for Machine Learning on Graphs [86.96887552203479]
We present the Open Graph Benchmark (OGB) to facilitate scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains. For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics.
arXiv Detail & Related papers (2020-05-02T03:09:50Z)
giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration [4.8353738137338755]
giotto-tda is a Python library that integrates high-performance topological data analysis with machine learning. The library's ability to handle various types of data is rooted in a wide range of preprocessing techniques.
arXiv Detail & Related papers (2020-04-06T10:53:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.