Plotly-Resampler: Effective Visual Analytics for Large Time Series
- URL: http://arxiv.org/abs/2206.08703v1
- Date: Fri, 17 Jun 2022 16:12:55 GMT
- Title: Plotly-Resampler: Effective Visual Analytics for Large Time Series
- Authors: Jonas Van Der Donckt, Jeroen Van Der Donckt, Emiel Deprost, Sofie Van
Hoecke
- Abstract summary: Plotly-Resampler is an add-on for Plotly's Python bindings, enhancing line chart scalability on top of an interactive toolkit.
Plotly-Resampler's flexible data aggregation functionality paves the path towards researching novel aggregation techniques.
- Score: 1.0756377625425109
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual analytics is arguably the most important step in getting acquainted
with your data. This is especially the case for time series, as this data type
is hard to describe and cannot be fully understood when using for example
summary statistics. To realize effective time series visualization, four
requirements have to be met; a tool should be (1) interactive, (2) scalable to
millions of data points, (3) integrable in conventional data science
environments, and (4) highly configurable. We observe that open source Python
visualization toolkits empower data scientists in most visual analytics tasks,
but lack the combination of scalability and interactivity to realize effective
time series visualization. As a means to facilitate these requirements, we
created Plotly-Resampler, an open source Python library. Plotly-Resampler is an
add-on for Plotly's Python bindings, enhancing line chart scalability on top of
an interactive toolkit by aggregating the underlying data depending on the
current graph view. Plotly-Resampler is built to be snappy, as the reactivity
of a tool qualitatively affects how analysts visually explore and analyze data.
A benchmark task highlights how our toolkit scales better than alternatives in
terms of number of samples and time series. Additionally, Plotly-Resampler's
flexible data aggregation functionality paves the path towards researching
novel aggregation techniques. Plotly-Resampler's integrability, together with
its configurability, convenience, and high scalability, allows to effectively
analyze high-frequency data in your day-to-day Python environment.
Related papers
- Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code [1.5999407512883512]
This paper introduces the human-curated PandasPlotBench dataset.
It is designed to evaluate language models' effectiveness as assistants in visual data exploration.
arXiv Detail & Related papers (2024-12-03T19:05:37Z) - Timeseria: an object-oriented time series processing library [0.40964539027092917]
Timeseria is an object-oriented time series processing library implemented in Python.
It aims at making it easier to manipulate time series data and to build statistical and machine learning models on top of it.
arXiv Detail & Related papers (2024-10-12T15:29:18Z) - Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining.
We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure.
This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z) - Temporal Graph Benchmark for Machine Learning on Temporal Graphs [54.52243310226456]
Temporal Graph Benchmark (TGB) is a collection of challenging and diverse benchmark datasets.
We benchmark each dataset and find that the performance of common models can vary drastically across datasets.
TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research.
arXiv Detail & Related papers (2023-07-03T13:58:20Z) - GenPlot: Increasing the Scale and Diversity of Chart Derendering Data [0.0]
We propose GenPlot, a plot generator that can generate billions of additional plots for chart-derendering using synthetic data.
OCR-free chart-to-text translation has achieved state-of-the-art results on visual language tasks.
arXiv Detail & Related papers (2023-06-20T17:25:53Z) - PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time
Series [0.0]
PyPOTS is an open-source Python library dedicated to data mining and analysis on partially-observed time series.
It provides easy access to diverse algorithms categorized into four tasks: imputation, classification, clustering, and forecasting.
arXiv Detail & Related papers (2023-05-30T07:57:05Z) - VizExtract: Automatic Relation Extraction from Data Visualizations [7.2241069295727955]
This paper presents a framework for automatically extracting compared variables from statistical charts.
We leverage a computer vision based framework to automatically identify and localize visualization facets in line graphs, scatter plots, or bar graphs.
In controlled experiments, our framework is able to classify, with 87.5% accuracy, the correlation between variables for graphs with 1-3 series per graph, varying colors, and solid line styles.
arXiv Detail & Related papers (2021-12-07T04:27:08Z) - Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision
Datasets from 3D Scans [103.92680099373567]
This paper introduces a pipeline to parametrically sample and render multi-task vision datasets from comprehensive 3D scans from the real world.
Changing the sampling parameters allows one to "steer" the generated datasets to emphasize specific information.
Common architectures trained on a generated starter dataset reached state-of-the-art performance on multiple common vision tasks and benchmarks.
arXiv Detail & Related papers (2021-10-11T04:21:46Z) - Automatic Curation of Large-Scale Datasets for Audio-Visual
Representation Learning [62.47593143542552]
We describe a subset optimization approach for automatic dataset curation.
We demonstrate that our approach finds videos with high audio-visual correspondence and show that self-supervised models trained on our data, despite being automatically constructed, achieve similar downstream performances to existing video datasets with similar scales.
arXiv Detail & Related papers (2021-01-26T14:27:47Z) - Open Graph Benchmark: Datasets for Machine Learning on Graphs [86.96887552203479]
We present the Open Graph Benchmark (OGB) to facilitate scalable, robust, and reproducible graph machine learning (ML) research.
OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains.
For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics.
arXiv Detail & Related papers (2020-05-02T03:09:50Z) - PyODDS: An End-to-end Outlier Detection System with Automated Machine
Learning [55.32009000204512]
We present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support.
Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space.
It also provides unified interfaces and visualizations for users with or without data science or machine learning background.
arXiv Detail & Related papers (2020-03-12T03:30:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.