Related papers: Toeplitz Least Squares Problems, Fast Algorithms and Big Data

Toeplitz Least Squares Problems, Fast Algorithms and Big Data

URL: http://arxiv.org/abs/2112.12994v1
Date: Fri, 24 Dec 2021 08:32:09 GMT
Title: Toeplitz Least Squares Problems, Fast Algorithms and Big Data
Authors: Ali Eshragh, Oliver Di Pietro and Michael A. Saunders
Abstract summary: Two recent algorithms have applied randomized numerical linear algebra techniques to fitting an autoregressive model to big time-series data. We investigate and compare the quality of these two approximation algorithms on large-scale synthetic and real-world data. While both algorithms display comparable results for synthetic datasets, the LSAR algorithm appears to be more robust when applied to real-world time series data.
Score: 1.3535770763481905
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In time series analysis, when fitting an autoregressive model, one must solve a Toeplitz ordinary least squares problem numerous times to find an appropriate model, which can severely affect computational times with large data sets. Two recent algorithms (LSAR and Repeated Halving) have applied randomized numerical linear algebra (RandNLA) techniques to fitting an autoregressive model to big time-series data. We investigate and compare the quality of these two approximation algorithms on large-scale synthetic and real-world data. While both algorithms display comparable results for synthetic datasets, the LSAR algorithm appears to be more robust when applied to real-world time series data. We conclude that RandNLA is effective in the context of big-data time series.

Related papers

Optimizing VarLiNGAM for Scalable and Efficient Time Series Causal Discovery [5.430532390358285]
Causal discovery is designed to identify causal relationships in data. Time series causal discovery is particularly challenging due to the need to account for temporal dependencies and potential time lag effects. This study significantly improves the feasibility of processing large datasets.
arXiv Detail & Related papers (2024-09-09T10:52:58Z)
SALSA: Sequential Approximate Leverage-Score Algorithm with Application in Analyzing Big Time Series Data [46.42365692992566]
We develop a new efficient sequential approximate leverage score algorithm, SALSA, using methods from randomized numerical linear algebra. We show that the theoretical computational complexity and numerical accuracy of SALSA surpass existing approximations. Our proposed algorithm is, with high probability, guaranteed to find the maximum likelihood estimates of the parameters for the true underlying ARMA model.
arXiv Detail & Related papers (2023-12-30T02:36:53Z)
Making RL with Preference-based Feedback Efficient via Randomization [11.019088464664696]
Reinforcement Learning algorithms that learn from human feedback need to be efficient in terms of statistical complexity, computational complexity, and query complexity. We present an algorithm that is sample efficient (i.e. has near-optimal-case regret bounds) and has running time (i.e. computational complexity is worst with respect to relevant parameters) To extend the results to more general nonlinear function approximation, we design a model-based randomized algorithm inspired by the idea of Thompson sampling.
arXiv Detail & Related papers (2023-10-23T04:19:35Z)
Learning to Bound Counterfactual Inference in Structural Causal Models from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm. The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources. It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z)
Ranking with Confidence for Large Scale Comparison Data [1.2183405753834562]
In this work, we leverage a generative data model considering comparison noise to develop a fast, precise, and informative ranking from pairwise comparisons. In real data, PD-Rank requires less computational time to achieve the same Kendall algorithm than active learning methods.
arXiv Detail & Related papers (2022-02-03T16:36:37Z)
SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory. We propose StreaMRAK - a streaming version of KRR. We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z)
Deep Time Series Models for Scarce Data [8.673181404172963]
Time series data have grown at an explosive rate in numerous domains and have stimulated a surge of time series modeling research. Data scarcity is a universal issue that occurs in a vast range of data analytics problems.
arXiv Detail & Related papers (2021-03-16T22:16:54Z)
Deep Cellular Recurrent Network for Efficient Analysis of Time-Series Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information. The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates. The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature. It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.