pyFAST: A Modular PyTorch Framework for Time Series Modeling with Multi-source and Sparse Data
- URL: http://arxiv.org/abs/2508.18891v1
- Date: Tue, 26 Aug 2025 10:05:47 GMT
- Title: pyFAST: A Modular PyTorch Framework for Time Series Modeling with Multi-source and Sparse Data
- Authors: Zhijin Wang, Senzhen Wu, Yue Hu, Xiufeng Liu,
- Abstract summary: pyFAST is a research-oriented PyTorch framework for time series analysis.<n>Its data engine is engineered for complex scenarios, supporting multi-source loading, protein sequence handling, efficient sequence- and patch-level padding, dynamic normalization, and mask-based modeling.<n>Released under the MIT license at GitHub, pyFAST provides a compact yet powerful platform for advancing time series research and applications.
- Score: 10.949140998070732
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern time series analysis demands frameworks that are flexible, efficient, and extensible. However, many existing Python libraries exhibit limitations in modularity and in their native support for irregular, multi-source, or sparse data. We introduce pyFAST, a research-oriented PyTorch framework that explicitly decouples data processing from model computation, fostering a cleaner separation of concerns and facilitating rapid experimentation. Its data engine is engineered for complex scenarios, supporting multi-source loading, protein sequence handling, efficient sequence- and patch-level padding, dynamic normalization, and mask-based modeling for both imputation and forecasting. pyFAST integrates LLM-inspired architectures for the alignment-free fusion of sparse data sources and offers native sparse metrics, specialized loss functions, and flexible exogenous data fusion. Training utilities include batch-based streaming aggregation for evaluation and device synergy to maximize computational efficiency. A comprehensive suite of classical and deep learning models (Linears, CNNs, RNNs, Transformers, and GNNs) is provided within a modular architecture that encourages extension. Released under the MIT license at GitHub, pyFAST provides a compact yet powerful platform for advancing time series research and applications.
Related papers
- PyLate: Flexible Training and Retrieval for Late Interaction Models [3.737581531719168]
We introduce PyLate, a library built on top of Sentence Transformers to support multi-vector architectures.<n>By offering multi-vector-specific features such as efficient indexes, PyLate aims to accelerate research and real-world application of late interaction models.<n> PyLate has already enabled the development of state-of-the-art models, including GTE-ModernColBERT and Reason-ModernColBERT.
arXiv Detail & Related papers (2025-08-05T15:23:40Z) - Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models [83.65386456026441]
Data-Juicer 2.0 is a data processing system backed by 100+ data processing operators spanning text, image, video, and audio modalities.<n>It supports more critical tasks including data analysis, synthesis, annotation, and foundation model post-training.<n>The system is publicly available and has been widely adopted in diverse research fields and real-world products such as Alibaba Cloud PAI.
arXiv Detail & Related papers (2024-12-23T08:29:57Z) - Wolf2Pack: The AutoFusion Framework for Dynamic Parameter Fusion [4.164728134421114]
We introduce AutoFusion, a framework that fuses distinct model parameters for multi-task learning without pre-trained checkpoints.
We validate AutoFusion's effectiveness through experiments on commonly used benchmark datasets.
Our framework offers a scalable and flexible solution for model integration, positioning it as a powerful tool for future research and practical applications.
arXiv Detail & Related papers (2024-10-08T07:21:24Z) - UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens.
Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z) - FlexModel: A Framework for Interpretability of Distributed Large
Language Models [0.0]
We present FlexModel, a software package providing a streamlined interface for engaging with models distributed across multi- GPU and multi-node configurations.
The library is compatible with existing model distribution libraries and encapsulates PyTorch models.
It exposes user-registerable HookFunctions to facilitate straightforward interaction with distributed model internals.
arXiv Detail & Related papers (2023-12-05T21:19:33Z) - MatFormer: Nested Transformer for Elastic Inference [91.45687988953435]
MatFormer is a novel Transformer architecture designed to provide elastic inference across diverse deployment constraints.<n>MatFormer achieves this by incorporating a nested Feed Forward Network (FFN) block structure within a standard Transformer model.<n>We show that a 850M decoder-only MatFormer language model (MatLM) allows us to extract multiple smaller models spanning from 582M to 850M parameters.
arXiv Detail & Related papers (2023-10-11T17:57:14Z) - PyPOTS: A Python Toolkit for Machine Learning on Partially-Observed Time Series [20.491714178518155]
PyPOTS is an open-source library for data mining and analysis.<n>It provides easy access to diverse algorithms categorized into five tasks.<n>PyPOTS is available on PyPI, Anaconda, and Docker.
arXiv Detail & Related papers (2023-05-30T07:57:05Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Merlion: A Machine Learning Library for Time Series [73.46386700728577]
Merlion is an open-source machine learning library for time series.
It features a unified interface for models and datasets for anomaly detection and forecasting.
Merlion also provides a unique evaluation framework that simulates the live deployment and re-training of a model in production.
arXiv Detail & Related papers (2021-09-20T02:03:43Z) - pyWATTS: Python Workflow Automation Tool for Time Series [0.20315704654772418]
pyWATTS is a non-sequential workflow automation tool for the analysis of time series data.
pyWATTS includes modules with clearly defined interfaces to enable seamless integration of new or existing methods.
pyWATTS supports key Python machine learning libraries such as scikit-learn, PyTorch, and Keras.
arXiv Detail & Related papers (2021-06-18T14:50:11Z) - Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.