Related papers: Plotly-Resampler: Effective Visual Analytics for Large Time Series

Plotly-Resampler: Effective Visual Analytics for Large Time Series

URL: http://arxiv.org/abs/2206.08703v1
Date: Fri, 17 Jun 2022 16:12:55 GMT
Title: Plotly-Resampler: Effective Visual Analytics for Large Time Series
Authors: Jonas Van Der Donckt, Jeroen Van Der Donckt, Emiel Deprost, Sofie Van Hoecke
Abstract summary: Plotly-Resampler is an add-on for Plotly's Python bindings, enhancing line chart scalability on top of an interactive toolkit. Plotly-Resampler's flexible data aggregation functionality paves the path towards researching novel aggregation techniques.
Score: 1.0756377625425109
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual analytics is arguably the most important step in getting acquainted with your data. This is especially the case for time series, as this data type is hard to describe and cannot be fully understood when using for example summary statistics. To realize effective time series visualization, four requirements have to be met; a tool should be (1) interactive, (2) scalable to millions of data points, (3) integrable in conventional data science environments, and (4) highly configurable. We observe that open source Python visualization toolkits empower data scientists in most visual analytics tasks, but lack the combination of scalability and interactivity to realize effective time series visualization. As a means to facilitate these requirements, we created Plotly-Resampler, an open source Python library. Plotly-Resampler is an add-on for Plotly's Python bindings, enhancing line chart scalability on top of an interactive toolkit by aggregating the underlying data depending on the current graph view. Plotly-Resampler is built to be snappy, as the reactivity of a tool qualitatively affects how analysts visually explore and analyze data. A benchmark task highlights how our toolkit scales better than alternatives in terms of number of samples and time series. Additionally, Plotly-Resampler's flexible data aggregation functionality paves the path towards researching novel aggregation techniques. Plotly-Resampler's integrability, together with its configurability, convenience, and high scalability, allows to effectively analyze high-frequency data in your day-to-day Python environment.

Related papers

Leveraging Vision Capabilities of Multimodal LLMs for Automated Data Extraction from Plots [0.0]
We show that current large language models, with proper instructions and engineered prompts, are capable of accurately extracting data from plots. This capability is inherent to the pretrained models and can be achieved with a chain-of-thought sequence of zero-shot engineered prompts.
arXiv Detail & Related papers (2025-03-16T02:41:43Z)
Graphint: Graph-based Time Series Clustering Visualisation Tool [21.763409747687348]
Graphint is an innovative system based on the $k$-Graph methodology. It integrates a robust time series clustering algorithm with an interactive tool for comparison and interpretation.
arXiv Detail & Related papers (2025-03-10T17:20:02Z)
Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code [1.5999407512883512]
PandasPlotBench is designed to evaluate language models' effectiveness as assistants in visual data exploration. The dataset includes 175 unique tasks. Our experiments assess several leading Large Language Models (LLMs) across three visualization libraries: Matplotlib, Seaborn, and Plotly.
arXiv Detail & Related papers (2024-12-03T19:05:37Z)
Exploring Scalability in Large-Scale Time Series in DeepVATS framework [3.8436076642278754]
DeepVATS is a tool that merges Deep Learning (Deep) with Visual Analytics (VA) for the analysis of large time series data (TS) The Deep Learning module, developed in R, manages the load of datasets and Deep Learning models from and to the Storage module. This paper introduces the tool and examines its scalability through log analytics.
arXiv Detail & Related papers (2024-08-08T15:30:48Z)
Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining. We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure. This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z)
Temporal Graph Benchmark for Machine Learning on Temporal Graphs [54.52243310226456]
Temporal Graph Benchmark (TGB) is a collection of challenging and diverse benchmark datasets. We benchmark each dataset and find that the performance of common models can vary drastically across datasets. TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research.
arXiv Detail & Related papers (2023-07-03T13:58:20Z)
GenPlot: Increasing the Scale and Diversity of Chart Derendering Data [0.0]
We propose GenPlot, a plot generator that can generate billions of additional plots for chart-derendering using synthetic data. OCR-free chart-to-text translation has achieved state-of-the-art results on visual language tasks.
arXiv Detail & Related papers (2023-06-20T17:25:53Z)
PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time Series [0.0]
PyPOTS is an open-source Python library dedicated to data mining and analysis on partially-observed time series. It provides easy access to diverse algorithms categorized into four tasks: imputation, classification, clustering, and forecasting.
arXiv Detail & Related papers (2023-05-30T07:57:05Z)
VizExtract: Automatic Relation Extraction from Data Visualizations [7.2241069295727955]
This paper presents a framework for automatically extracting compared variables from statistical charts. We leverage a computer vision based framework to automatically identify and localize visualization facets in line graphs, scatter plots, or bar graphs. In controlled experiments, our framework is able to classify, with 87.5% accuracy, the correlation between variables for graphs with 1-3 series per graph, varying colors, and solid line styles.
arXiv Detail & Related papers (2021-12-07T04:27:08Z)
Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans [103.92680099373567]
This paper introduces a pipeline to parametrically sample and render multi-task vision datasets from comprehensive 3D scans from the real world. Changing the sampling parameters allows one to "steer" the generated datasets to emphasize specific information. Common architectures trained on a generated starter dataset reached state-of-the-art performance on multiple common vision tasks and benchmarks.
arXiv Detail & Related papers (2021-10-11T04:21:46Z)
Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning [62.47593143542552]
We describe a subset optimization approach for automatic dataset curation. We demonstrate that our approach finds videos with high audio-visual correspondence and show that self-supervised models trained on our data, despite being automatically constructed, achieve similar downstream performances to existing video datasets with similar scales.
arXiv Detail & Related papers (2021-01-26T14:27:47Z)
MusPy: A Toolkit for Symbolic Music Generation [32.01713268702699]
MusPy is an open source Python library for symbolic music generation. In this paper, we present statistical analysis of the eleven datasets currently supported by MusPy.
arXiv Detail & Related papers (2020-08-05T06:16:13Z)
Open Graph Benchmark: Datasets for Machine Learning on Graphs [86.96887552203479]
We present the Open Graph Benchmark (OGB) to facilitate scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains. For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics.
arXiv Detail & Related papers (2020-05-02T03:09:50Z)
PyODDS: An End-to-end Outlier Detection System with Automated Machine Learning [55.32009000204512]
We present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support. Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space. It also provides unified interfaces and visualizations for users with or without data science or machine learning background.
arXiv Detail & Related papers (2020-03-12T03:30:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.