Multi-Modal Financial Time-Series Retrieval Through Latent Space
Projections
- URL: http://arxiv.org/abs/2309.16741v2
- Date: Tue, 2 Jan 2024 10:18:24 GMT
- Title: Multi-Modal Financial Time-Series Retrieval Through Latent Space
Projections
- Authors: Tom Bamford, Andrea Coletta, Elizabeth Fons, Sriram Gopalakrishnan,
Svitlana Vyetrenko, Tucker Balch, Manuela Veloso
- Abstract summary: We propose a framework to store multi-modal data for financial time-series in a lower-dimensional latent space using deep encoders.
Our approach allows user-friendly query interfaces, enabling natural language text or sketches of time-series.
- Score: 9.092375785645306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Financial firms commonly process and store billions of time-series data,
generated continuously and at a high frequency. To support efficient data
storage and retrieval, specialized time-series databases and systems have
emerged. These databases support indexing and querying of time-series by a
constrained Structured Query Language(SQL)-like format to enable queries like
"Stocks with monthly price returns greater than 5%", and expressed in rigid
formats. However, such queries do not capture the intrinsic complexity of high
dimensional time-series data, which can often be better described by images or
language (e.g., "A stock in low volatility regime"). Moreover, the required
storage, computational time, and retrieval complexity to search in the
time-series space are often non-trivial. In this paper, we propose and
demonstrate a framework to store multi-modal data for financial time-series in
a lower-dimensional latent space using deep encoders, such that the latent
space projections capture not only the time series trends but also other
desirable information or properties of the financial time-series data (such as
price volatility). Moreover, our approach allows user-friendly query
interfaces, enabling natural language text or sketches of time-series, for
which we have developed intuitive interfaces. We demonstrate the advantages of
our method in terms of computational efficiency and accuracy on real historical
data as well as synthetic data, and highlight the utility of latent-space
projections in the storage and retrieval of financial time-series data with
intuitive query modalities.
Related papers
- FinTMMBench: Benchmarking Temporal-Aware Multi-Modal RAG in Finance [79.78247299859656]
FinTMMBench is the first comprehensive benchmark for evaluating temporal-aware multi-modal Retrieval-Augmented Generation systems in finance.
Built from heterologous data of NASDAQ 100 companies, FinTMMBench offers three significant advantages.
arXiv Detail & Related papers (2025-03-07T07:13:59Z) - Time-MQA: Time Series Multi-Task Question Answering with Context Enhancement [55.2439260314328]
Time Series Multi-Task Question Answering (Time-MQA) is a unified framework that enables natural language queries across multiple time series tasks.
Central to Time-MQA is the TSQA dataset, a large-scale dataset containing $sim $200k question-answer pairs.
arXiv Detail & Related papers (2025-02-26T13:47:13Z) - TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents [52.13094810313054]
TimeCAP is a time-series processing framework that creatively employs Large Language Models (LLMs) as contextualizers of time series data.
TimeCAP incorporates two independent LLM agents: one generates a textual summary capturing the context of the time series, while the other uses this enriched summary to make more informed predictions.
Experimental results on real-world datasets demonstrate that TimeCAP outperforms state-of-the-art methods for time series event prediction.
arXiv Detail & Related papers (2025-02-17T04:17:27Z) - Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative [65.84249211767921]
Texts as Time Series (TaTS) considers the time-series-paired texts to be auxiliary variables of the time series.
TaTS can be plugged into any existing numerical-only time series models and enable them to handle time series data with paired texts effectively.
arXiv Detail & Related papers (2025-02-13T03:43:27Z) - Enhancing Financial Time-Series Forecasting with Retrieval-Augmented Large Language Models [29.769616823587594]
We propose the first retrieval-augmented generation (RAG) framework specifically designed for financial time-series forecasting.
Our framework incorporates three key innovations: a fine-tuned 1B large language model (StockLLM) as its backbone, a novel candidate selection method enhanced by LLM feedback, and a training objective that maximizes the similarity between queries and historically significant sequences.
arXiv Detail & Related papers (2025-02-09T12:26:05Z) - TableTime: Reformulating Time Series Classification as Zero-Shot Table Understanding via Large Language Models [54.44272772296578]
Large language models (LLMs) have demonstrated their effectiveness in multivariate time series classification.
LLMs directly encode embeddings for time series within the latent space of LLMs from scratch to align with semantic space of LLMs.
We propose TableTime, which reformulates MTSC as a table understanding task.
arXiv Detail & Related papers (2024-11-24T07:02:32Z) - Timeseria: an object-oriented time series processing library [0.40964539027092917]
Timeseria is an object-oriented time series processing library implemented in Python.
It aims at making it easier to manipulate time series data and to build statistical and machine learning models on top of it.
arXiv Detail & Related papers (2024-10-12T15:29:18Z) - StockTime: A Time Series Specialized Large Language Model Architecture for Stock Price Prediction [13.52020491768311]
We introduce StockTime, a novel LLM-based architecture designed specifically for stock price time series data.
Unlike recent FinLLMs, StockTime is specifically designed for stock price time series data.
By fusing this multimodal data, StockTime effectively predicts stock prices across arbitrary look-back periods.
arXiv Detail & Related papers (2024-08-25T00:50:33Z) - KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches [52.02764371205856]
Long context capability is a crucial competency for large language models (LLMs)
This work provides a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks.
arXiv Detail & Related papers (2024-07-01T17:59:47Z) - Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding [57.62275091656578]
We refer to the complex events composed of many news articles over an extended period as Temporal Complex Event (TCE)
This paper proposes a novel approach using Large Language Models (LLMs) to systematically extract and analyze the event chain within TCE.
arXiv Detail & Related papers (2024-06-04T16:42:17Z) - SalienTime: User-driven Selection of Salient Time Steps for Large-Scale
Geospatial Data Visualization [16.343440057064864]
We propose a novel approach that leverages autoencoders and dynamic programming to facilitate user-driven temporal selections.
We design and implement a web-based interface to enable efficient and context-aware selection of time steps.
arXiv Detail & Related papers (2024-03-06T04:27:10Z) - Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow [49.724842920942024]
Industries such as finance, meteorology, and energy generate vast amounts of data daily.
We propose Data-Copilot, a data analysis agent that autonomously performs querying, processing, and visualization of massive data tailored to diverse human requests.
arXiv Detail & Related papers (2023-06-12T16:12:56Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Grouped self-attention mechanism for a memory-efficient Transformer [64.0125322353281]
Real-world tasks such as forecasting weather, electricity consumption, and stock market involve predicting data that vary over time.
Time-series data are generally recorded over a long period of observation with long sequences owing to their periodic characteristics and long-range dependencies over time.
We propose two novel modules, Grouped Self-Attention (GSA) and Compressed Cross-Attention (CCA)
Our proposed model efficiently exhibited reduced computational complexity and performance comparable to or better than existing methods.
arXiv Detail & Related papers (2022-10-02T06:58:49Z) - Attention Augmented Convolutional Transformer for Tabular Time-series [0.9137554315375922]
Time-series classification is one of the most frequently performed tasks in industrial data science.
We propose a novel scalable architecture for learning representations from time-series data.
Our proposed model is end-to-end and can handle both categorical and continuous valued inputs.
arXiv Detail & Related papers (2021-10-05T05:20:46Z) - Machine Learning for Temporal Data in Finance: Challenges and
Opportunities [0.0]
Temporal data are ubiquitous in the financial services (FS) industry.
But machine learning efforts often fail to account for the temporal richness of these data.
arXiv Detail & Related papers (2020-09-11T19:39:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.