PTRAIL -- A python package for parallel trajectory data preprocessing
- URL: http://arxiv.org/abs/2108.13202v1
- Date: Thu, 26 Aug 2021 20:14:07 GMT
- Title: PTRAIL -- A python package for parallel trajectory data preprocessing
- Authors: Salman Haidri, Yaksh J. Haranwala, Vania Bogorny, Chiara Renso,
Vinicius Prado da Fonseca, Amilcar Soares
- Abstract summary: Trajectory data represent a trace of an object that changes its position in space over time.
There is a need for software specifically tailored to preprocess trajectory data.
We propose PTRAIL, a python package offering several trajectory preprocessing steps.
- Score: 2.348339658768759
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Trajectory data represent a trace of an object that changes its position in
space over time. This kind of data is complex to handle and analyze, since it
is generally produced in huge quantities, often prone to errors generated by
the geolocation device, human mishandling, or area coverage limitation.
Therefore, there is a need for software specifically tailored to preprocess
trajectory data. In this work we propose PTRAIL, a python package offering
several trajectory preprocessing steps, including filtering, feature
extraction, and interpolation. PTRAIL uses parallel computation and
vectorization, being suitable for large datasets and fast compared to other
python libraries.
Related papers
- Kamae: Bridging Spark and Keras for Seamless ML Preprocessing [0.0]
Kamae is a Python library that bridges the gap by translating PySpark preprocessing pipelines into equivalent Keras models.<n>The framework is illustrated on real-world use cases, including MovieLens dataset and Expedia's Learning-to-Rank pipelines.
arXiv Detail & Related papers (2025-07-08T14:30:10Z) - Timeseria: an object-oriented time series processing library [0.40964539027092917]
Timeseria is an object-oriented time series processing library implemented in Python.
It aims at making it easier to manipulate time series data and to build statistical and machine learning models on top of it.
arXiv Detail & Related papers (2024-10-12T15:29:18Z) - Revisiting CNNs for Trajectory Similarity Learning [20.311950784166388]
We introduce ConvTraj, incorporating both 1D and 2D convolutions to capture sequential and geo-distribution features of trajectories.
We show that ConvTraj achieves state-of-the-art accuracy in trajectory similarity search.
arXiv Detail & Related papers (2024-05-30T07:16:03Z) - Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow [49.724842920942024]
Industries such as finance, meteorology, and energy generate vast amounts of data daily.
We propose Data-Copilot, a data analysis agent that autonomously performs querying, processing, and visualization of massive data tailored to diverse human requests.
arXiv Detail & Related papers (2023-06-12T16:12:56Z) - PARTIME: Scalable and Parallel Processing Over Time with Deep Neural
Networks [68.96484488899901]
We present PARTIME, a library designed to speed up neural networks whenever data is continuously streamed over time.
PARTIME starts processing each data sample at the time in which it becomes available from the stream.
Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning.
arXiv Detail & Related papers (2022-10-17T14:49:14Z) - DADApy: Distance-based Analysis of DAta-manifolds in Python [51.37841707191944]
DADApy is a python software package for analysing and characterising high-dimensional data.
It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for comparing different distance metrics.
arXiv Detail & Related papers (2022-05-04T08:41:59Z) - Data Engineering for HPC with Python [0.0]
Data engineering deals with a variety of data formats, storage, data extraction, transformation, and data movements.
One goal of data engineering is to transform data from original data to vector/matrix/tensor formats accepted by deep learning and machine learning applications.
We present a distributed Python API based on table abstraction for representing and processing data.
arXiv Detail & Related papers (2020-10-13T11:53:11Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z) - PyTorch Distributed: Experiences on Accelerating Data Parallel Training [11.393654219774444]
PyTorch is a widely-adopted scientific computing package used in deep learning research and applications.
This paper presents the design, implementation, and evaluation of the PyTorch distributed data parallel module.
arXiv Detail & Related papers (2020-06-28T20:39:45Z) - PyODDS: An End-to-end Outlier Detection System with Automated Machine
Learning [55.32009000204512]
We present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support.
Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space.
It also provides unified interfaces and visualizations for users with or without data science or machine learning background.
arXiv Detail & Related papers (2020-03-12T03:30:30Z) - Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.