On Creating a Causally Grounded Usable Rating Method for Assessing the Robustness of Foundation Models Supporting Time Series
- URL: http://arxiv.org/abs/2502.12226v1
- Date: Mon, 17 Feb 2025 15:26:16 GMT
- Title: On Creating a Causally Grounded Usable Rating Method for Assessing the Robustness of Foundation Models Supporting Time Series
- Authors: Kausik Lakkaraju, Rachneet Kaur, Parisa Zehtabi, Sunandita Patra, Siva Likitha Valluru, Zhen Zeng, Biplav Srivastava, Marco Valtorta,
- Abstract summary: We propose a causally grounded rating framework to study the robustness of Foundational Models for Time Series.
We evaluate six state-of-the-art (some multi-modal) FMTS across six prominent stocks spanning three industries.
- Score: 9.785749529142304
- License:
- Abstract: Foundation Models (FMs) have improved time series forecasting in various sectors, such as finance, but their vulnerability to input disturbances can hinder their adoption by stakeholders, such as investors and analysts. To address this, we propose a causally grounded rating framework to study the robustness of Foundational Models for Time Series (FMTS) with respect to input perturbations. We evaluate our approach to the stock price prediction problem, a well-studied problem with easily accessible public data, evaluating six state-of-the-art (some multi-modal) FMTS across six prominent stocks spanning three industries. The ratings proposed by our framework effectively assess the robustness of FMTS and also offer actionable insights for model selection and deployment. Within the scope of our study, we find that (1) multi-modal FMTS exhibit better robustness and accuracy compared to their uni-modal versions and, (2) FMTS pre-trained on time series forecasting task exhibit better robustness and forecasting accuracy compared to general-purpose FMTS pre-trained across diverse settings. Further, to validate our framework's usability, we conduct a user study showcasing FMTS prediction errors along with our computed ratings. The study confirmed that our ratings reduced the difficulty for users in comparing the robustness of different systems.
Related papers
- Forecasting Company Fundamentals [19.363166648866066]
We evaluate 22 deterministic and probabilistic company fundamentals forecasting models on real company data.
We find that deep learning models provide superior forcasting performance to classical models.
We show how these high-quality forecasts can benefit automated stock allocation.
arXiv Detail & Related papers (2024-10-21T14:21:43Z) - AMA-LSTM: Pioneering Robust and Fair Financial Audio Analysis for Stock Volatility Prediction [25.711345527738068]
multimodal methods have faced two drawbacks.
They often fail to yield reliable models and overfit the data due to their absorption of information from the stock market.
Using multimodal models to predict stock volatility suffers from gender bias and lacks an efficient way to eliminate such bias.
Our comprehensive experiments on robustness-world financial audio datasets reveal that this method exceeds the performance of current state-of-the-art solution.
arXiv Detail & Related papers (2024-07-03T18:40:53Z) - Rating Multi-Modal Time-Series Forecasting Models (MM-TSFM) for Robustness Through a Causal Lens [10.103561529332184]
We focus on multi-modal time-series forecasting, where imprecision due to noisy or incorrect data can lead to erroneous predictions.
We introduce a rating methodology to assess the robustness of Multi-Modal Time-Series Forecasting Models.
arXiv Detail & Related papers (2024-06-12T17:39:16Z) - Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
and LLMs Evaluations [111.88727295707454]
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP.
We propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts.
We conduct experiments on pre-trained language models for analysis and evaluation of OOD robustness.
arXiv Detail & Related papers (2023-06-07T17:47:03Z) - Toward Reliable Human Pose Forecasting with Uncertainty [51.628234388046195]
We develop an open-source library for human pose forecasting, including multiple models, supporting several datasets.
We devise two types of uncertainty in the problem to increase performance and convey better trust.
arXiv Detail & Related papers (2023-04-13T17:56:08Z) - Two-stage Modeling for Prediction with Confidence [0.0]
It is difficult to generalize the performance of neural networks under the condition of distributional shift.
We propose a novel two-stage model for the potential distribution shift problem.
We show that our model offers reliable predictions for the vast majority of datasets.
arXiv Detail & Related papers (2022-09-19T08:48:07Z) - An Empirical Study on Distribution Shift Robustness From the Perspective
of Pre-Training and Data Augmentation [91.62129090006745]
This paper studies the distribution shift problem from the perspective of pre-training and data augmentation.
We provide the first comprehensive empirical study focusing on pre-training and data augmentation.
arXiv Detail & Related papers (2022-05-25T13:04:53Z) - Certified Adversarial Defenses Meet Out-of-Distribution Corruptions:
Benchmarking Robustness and Simple Baselines [65.0803400763215]
This work critically examines how adversarial robustness guarantees change when state-of-the-art certifiably robust models encounter out-of-distribution data.
We propose a novel data augmentation scheme, FourierMix, that produces augmentations to improve the spectral coverage of the training data.
We find that FourierMix augmentations help eliminate the spectral bias of certifiably robust models enabling them to achieve significantly better robustness guarantees on a range of OOD benchmarks.
arXiv Detail & Related papers (2021-12-01T17:11:22Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Forecasting COVID-19 daily cases using phone call data [0.0]
We propose a simple Multiple Linear Regression model optimised to use call data to forecast the number of daily confirmed cases.
Our proposed approach outperforms ARIMA, ETS and a regression model without call data, evaluated by three point forecast error metrics, one prediction interval and two probabilistic forecast accuracy measures.
arXiv Detail & Related papers (2020-10-05T18:07:07Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.