Related papers: FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting

FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting

URL: http://arxiv.org/abs/2502.18834v1
Date: Wed, 26 Feb 2025 05:19:16 GMT
Title: FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting
Authors: Yifan Hu, Yuante Li, Peiyuan Liu, Yuxia Zhu, Naiqi Li, Tao Dai, Shu-tao Xia, Dawei Cheng, Changjun Jiang,
Abstract summary: Financial time series (FinTS) record the behavior of human-brain-augmented decision-making.<n>FinTSB is a comprehensive and practical benchmark for financial time series forecasting.
Score: 58.70072722290475
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Financial time series (FinTS) record the behavior of human-brain-augmented decision-making, capturing valuable historical information that can be leveraged for profitable investment strategies. Not surprisingly, this area has attracted considerable attention from researchers, who have proposed a wide range of methods based on various backbones. However, the evaluation of the area often exhibits three systemic limitations: 1. Failure to account for the full spectrum of stock movement patterns observed in dynamic financial markets. (Diversity Gap), 2. The absence of unified assessment protocols undermines the validity of cross-study performance comparisons. (Standardization Deficit), and 3. Neglect of critical market structure factors, resulting in inflated performance metrics that lack practical applicability. (Real-World Mismatch). Addressing these limitations, we propose FinTSB, a comprehensive and practical benchmark for financial time series forecasting (FinTSF). To increase the variety, we categorize movement patterns into four specific parts, tokenize and pre-process the data, and assess the data quality based on some sequence characteristics. To eliminate biases due to different evaluation settings, we standardize the metrics across three dimensions and build a user-friendly, lightweight pipeline incorporating methods from various backbones. To accurately simulate real-world trading scenarios and facilitate practical implementation, we extensively model various regulatory constraints, including transaction fees, among others. Finally, we conduct extensive experiments on FinTSB, highlighting key insights to guide model selection under varying market conditions. Overall, FinTSB provides researchers with a novel and comprehensive platform for improving and evaluating FinTSF methods. The code is available at https://github.com/TongjiFinLab/FinTSBenchmark.

Related papers

FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMs [2.06242362470764]
We introduce FinDPO, the first finance-specific sentiment analysis framework based on post-training human preference alignment.<n>The proposed FinDPO achieves state-of-the-art performance on standard sentiment classification benchmarks.<n>We show that FinDPO is the first sentiment-based approach to maintain substantial positive returns of 67% annually and strong risk-adjusted performance.
arXiv Detail & Related papers (2025-07-24T13:57:05Z)
FinS-Pilot: A Benchmark for Online Financial System [17.65500174763836]
FinS-Pilot is a novel benchmark for evaluating large language models (RAGs) in online financial applications.<n>Our benchmark incorporates both real-time API data and structured text sources, organized through an intent classification framework.<n>Our work contributes both a practical evaluation framework and a curated dataset to advance research in financial NLP systems.
arXiv Detail & Related papers (2025-05-31T03:50:19Z)
Timing is Important: Risk-aware Fund Allocation based on Time-Series Forecasting [10.540006708939647]
We introduce a Risk-aware Time-Series Predict-and-Allocate (RTS-PnO) framework to solve the problem of fund allocation.<n>The framework contains three features: (i) end-to-end training with objective alignment measurement, (ii) adaptive forecasting uncertainty calibration, and (iii) agnostic towards forecasting models.<n>The evaluation of RTS-PnO is conducted over both online and offline experiments.
arXiv Detail & Related papers (2025-05-30T17:36:45Z)
Extract, Match, and Score: An Evaluation Paradigm for Long Question-context-answer Triplets in Financial Analysis [13.92563557858618]
Large language models (LLMs) have sparked widespread adoption across diverse applications. conventional evaluation metrics diminish when evaluating the quality of long-form answers. This is particularly critical in real-world scenarios involving extended questions, extensive context, and long-form answers. We propose an effective Extract, Match, and Score (EMS) evaluation approach tailored to the complexities of long-form LLMs' outputs.
arXiv Detail & Related papers (2025-03-20T09:38:44Z)
FinTMMBench: Benchmarking Temporal-Aware Multi-Modal RAG in Finance [79.78247299859656]
FinTMMBench is the first comprehensive benchmark for evaluating temporal-aware multi-modal Retrieval-Augmented Generation systems in finance. Built from heterologous data of NASDAQ 100 companies, FinTMMBench offers three significant advantages.
arXiv Detail & Related papers (2025-03-07T07:13:59Z)
FinMTEB: Finance Massive Text Embedding Benchmark [18.990655668481075]
We introduce the Finance Massive Text Embedding Benchmark (FinMTEB), a specialized counterpart to MTEB designed for the financial domain.<n>FinMTEB comprises 64 financial domain-specific embedding datasets across 7 tasks.<n>We show three key findings: (1) performance on general-purpose benchmarks shows limited correlation with financial domain tasks; (2) domain-adapted models consistently outperform their general-purpose counterparts; and (3) surprisingly, a simple Bag-of-Words approach outperforms sophisticated dense embeddings in financial Semantic Textual Similarity tasks.
arXiv Detail & Related papers (2025-02-16T04:23:52Z)
Demystifying Domain-adaptive Post-training for Financial LLMs [79.581577578952]
FINDAP is a systematic and fine-grained investigation into domain adaptive post-training of large language models (LLMs)<n>Our approach consists of four key components: FinCap, FinRec, FinTrain and FinEval.<n>The resulting model, Llama-Fin, achieves state-of-the-art performance across a wide range of financial tasks.
arXiv Detail & Related papers (2025-01-09T04:26:15Z)
FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning [5.65203350495478]
We present Financial Cross-Modal Multi-Hop Reasoning (FCMR), a benchmark to analyze the reasoning capabilities of MLLMs.<n>FCMR is categorized into three difficulty levels-Easy, Medium, and Hard-facilitating a step-by-step evaluation.<n>Experiments on this new benchmark reveal that even state-of-the-art MLLMs struggle, with the best-performing model achieving only 30.4% accuracy on the most challenging tier.
arXiv Detail & Related papers (2024-12-17T05:50:55Z)
STORM: A Spatio-Temporal Factor Model Based on Dual Vector Quantized Variational Autoencoders for Financial Trading [55.02735046724146]
In financial trading, factor models are widely used to price assets and capture excess returns from mispricing.<n>We propose a Spatio-Temporal factOR Model based on dual vector quantized variational autoencoders, named STORM.<n>Storm extracts features of stocks from temporal and spatial perspectives, then fuses and aligns these features at the fine-grained and semantic level, and represents the factors as multi-dimensional embeddings.
arXiv Detail & Related papers (2024-12-12T17:15:49Z)
FinLLM-B: When Large Language Models Meet Financial Breakout Trading [13.465954970263502]
FinLLM-B is the premier large language model for financial breakout detection.<n>We have developed a novel framework for large language models, namely multi-stage structure.<n>Compared to GPT-3.5, FinLLM-B improves the average accuracy of answers and rational by 49.97%, with the multi-stage structure contributing 9.72% to the improvement.
arXiv Detail & Related papers (2024-02-12T10:04:07Z)
On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts. We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z)
Bayesian Bilinear Neural Network for Predicting the Mid-price Dynamics in Limit-Order Book Markets [84.90242084523565]
Traditional time-series econometric methods often appear incapable of capturing the true complexity of the multi-level interactions driving the price dynamics. By adopting a state-of-the-art second-order optimization algorithm, we train a Bayesian bilinear neural network with temporal attention. By addressing the use of predictive distributions to analyze errors and uncertainties associated with the estimated parameters and model forecasts, we thoroughly compare our Bayesian model with traditional ML alternatives.
arXiv Detail & Related papers (2022-03-07T18:59:54Z)
Interpretable ML-driven Strategy for Automated Trading Pattern Extraction [2.7910505923792646]
We propose a volume-based data pre-processing method for financial time series analysis. We use a statistical approach for assessing the performance of the method. Our analysis shows that the proposed method allows successful classification of the financial time series patterns.
arXiv Detail & Related papers (2021-03-23T09:55:46Z)
Gaussian process imputation of multiple financial series [71.08576457371433]
Multiple time series such as financial indicators, stock prices and exchange rates are strongly coupled due to their dependence on the latent state of the market. We focus on learning the relationships among financial time series by modelling them through a multi-output Gaussian process.
arXiv Detail & Related papers (2020-02-11T19:18:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.