Toeplitz Least Squares Problems, Fast Algorithms and Big Data
- URL: http://arxiv.org/abs/2112.12994v1
- Date: Fri, 24 Dec 2021 08:32:09 GMT
- Title: Toeplitz Least Squares Problems, Fast Algorithms and Big Data
- Authors: Ali Eshragh, Oliver Di Pietro and Michael A. Saunders
- Abstract summary: Two recent algorithms have applied randomized numerical linear algebra techniques to fitting an autoregressive model to big time-series data.
We investigate and compare the quality of these two approximation algorithms on large-scale synthetic and real-world data.
While both algorithms display comparable results for synthetic datasets, the LSAR algorithm appears to be more robust when applied to real-world time series data.
- Score: 1.3535770763481905
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In time series analysis, when fitting an autoregressive model, one must solve
a Toeplitz ordinary least squares problem numerous times to find an appropriate
model, which can severely affect computational times with large data sets. Two
recent algorithms (LSAR and Repeated Halving) have applied randomized numerical
linear algebra (RandNLA) techniques to fitting an autoregressive model to big
time-series data. We investigate and compare the quality of these two
approximation algorithms on large-scale synthetic and real-world data. While
both algorithms display comparable results for synthetic datasets, the LSAR
algorithm appears to be more robust when applied to real-world time series
data. We conclude that RandNLA is effective in the context of big-data time
series.
Related papers
- Optimizing VarLiNGAM for Scalable and Efficient Time Series Causal Discovery [5.430532390358285]
Causal discovery is designed to identify causal relationships in data.
Time series causal discovery is particularly challenging due to the need to account for temporal dependencies and potential time lag effects.
This study significantly improves the feasibility of processing large datasets.
arXiv Detail & Related papers (2024-09-09T10:52:58Z) - SALSA: Sequential Approximate Leverage-Score Algorithm with Application
in Analyzing Big Time Series Data [46.42365692992566]
We develop a new efficient sequential approximate leverage score algorithm, SALSA, using methods from randomized numerical linear algebra.
We show that the theoretical computational complexity and numerical accuracy of SALSA surpass existing approximations.
Our proposed algorithm is, with high probability, guaranteed to find the maximum likelihood estimates of the parameters for the true underlying ARMA model.
arXiv Detail & Related papers (2023-12-30T02:36:53Z) - Making RL with Preference-based Feedback Efficient via Randomization [11.019088464664696]
Reinforcement Learning algorithms that learn from human feedback need to be efficient in terms of statistical complexity, computational complexity, and query complexity.
We present an algorithm that is sample efficient (i.e. has near-optimal-case regret bounds) and has running time (i.e. computational complexity is worst with respect to relevant parameters)
To extend the results to more general nonlinear function approximation, we design a model-based randomized algorithm inspired by the idea of Thompson sampling.
arXiv Detail & Related papers (2023-10-23T04:19:35Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Ranking with Confidence for Large Scale Comparison Data [1.2183405753834562]
In this work, we leverage a generative data model considering comparison noise to develop a fast, precise, and informative ranking from pairwise comparisons.
In real data, PD-Rank requires less computational time to achieve the same Kendall algorithm than active learning methods.
arXiv Detail & Related papers (2022-02-03T16:36:37Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - Deep Time Series Models for Scarce Data [8.673181404172963]
Time series data have grown at an explosive rate in numerous domains and have stimulated a surge of time series modeling research.
Data scarcity is a universal issue that occurs in a vast range of data analytics problems.
arXiv Detail & Related papers (2021-03-16T22:16:54Z) - Deep Cellular Recurrent Network for Efficient Analysis of Time-Series
Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information.
The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.