Related papers: Generative Machine Learning for Multivariate Equity Returns

Generative Machine Learning for Multivariate Equity Returns

URL: http://arxiv.org/abs/2311.14735v1
Date: Tue, 21 Nov 2023 18:41:48 GMT
Title: Generative Machine Learning for Multivariate Equity Returns
Authors: Ruslan Tepelyan, Achintya Gopal
Abstract summary: We study the efficacy of conditional importance weighted autoencoders and conditional normalizing flows for the task of modeling the returns of equities. The main problem we work to address is modeling the joint distribution of all the members of the S&P 500, or, in other words, learning a 500-dimensional joint distribution. We show that this generative model has a broad range of applications in finance, including generating realistic synthetic data, volatility and correlation estimation, risk analysis, and portfolio optimization.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The use of machine learning to generate synthetic data has grown in popularity with the proliferation of text-to-image models and especially large language models. The core methodology these models use is to learn the distribution of the underlying data, similar to the classical methods common in finance of fitting statistical models to data. In this work, we explore the efficacy of using modern machine learning methods, specifically conditional importance weighted autoencoders (a variant of variational autoencoders) and conditional normalizing flows, for the task of modeling the returns of equities. The main problem we work to address is modeling the joint distribution of all the members of the S&P 500, or, in other words, learning a 500-dimensional joint distribution. We show that this generative model has a broad range of applications in finance, including generating realistic synthetic data, volatility and correlation estimation, risk analysis (e.g., value at risk, or VaR, of portfolios), and portfolio optimization.

Related papers

Intention-Conditioned Flow Occupancy Models [69.79049994662591]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z)
Quantifying Correlations of Machine Learning Models [8.834929420051534]
This paper explores three scenarios where error correlations between multiple models arise, resulting in aggregated risks. Our findings indicate that aggregated risks are substantial, particularly when models share similar algorithms, training datasets, or foundational models. Overall, we observe that correlations across models are pervasive and likely to intensify with increased reliance on foundational models and widely used public datasets.
arXiv Detail & Related papers (2025-02-06T10:19:51Z)
Synthetic Data for Portfolios: A Throw of the Dice Will Never Abolish Chance [0.0]
This paper contributes to a deeper understanding of the limitations of generative models, particularly in portfolio and risk management. We propose a pipeline for the generation of multivariate returns that meets conventional evaluation standards on a large universe of US equities.
arXiv Detail & Related papers (2025-01-07T18:50:24Z)
Structure Learning in Gaussian Graphical Models from Glauber Dynamics [6.982878344925993]
We present the first algorithm for Gaussian model selection when data are sampled according to the Glauber dynamics. We provide guarantees on the computational and statistical complexity of the proposed algorithm's structure learning performance.
arXiv Detail & Related papers (2024-12-24T18:49:13Z)
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data. We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z)
NeuralFactors: A Novel Factor Learning Approach to Generative Modeling of Equities [0.0]
We introduce NeuralFactors, a novel machine-learning based approach to factor analysis where a neural network outputs factor exposures and factor returns. We show that this model outperforms prior approaches in terms of log-likelihood performance and computational efficiency.
arXiv Detail & Related papers (2024-08-02T18:01:09Z)
Quantifying Distribution Shifts and Uncertainties for Enhanced Model Robustness in Machine Learning Applications [0.0]
This study explores model adaptation and generalization by utilizing synthetic data. We employ quantitative measures such as Kullback-Leibler divergence, Jensen-Shannon distance, and Mahalanobis distance to assess data similarity. Our findings suggest that utilizing statistical measures, such as the Mahalanobis distance, to determine whether model predictions fall within the low-error "interpolation regime" or the high-error "extrapolation regime" provides a complementary method for assessing distribution shift and model uncertainty.
arXiv Detail & Related papers (2024-05-03T10:05:31Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
Bayesian Active Learning for Discrete Latent Variable Models [19.852463786440122]
Active learning seeks to reduce the amount of data required to fit the parameters of a model. latent variable models play a vital role in neuroscience, psychology, and a variety of other engineering and scientific disciplines.
arXiv Detail & Related papers (2022-02-27T19:07:12Z)
A transformer-based model for default prediction in mid-cap corporate markets [13.535770763481905]
We study mid-cap companies with less than US $10 billion in market capitalisation. We look to predict the default probability term structure over the medium term. We understand which data sources contribute most to the default risk.
arXiv Detail & Related papers (2021-11-18T19:01:00Z)
Model-Based Deep Learning [155.063817656602]
Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques. Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance. We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches.
arXiv Detail & Related papers (2020-12-15T16:29:49Z)
Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors. We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method. Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
Robust pricing and hedging via neural SDEs [0.0]
We develop and analyse novel algorithms needed for efficient use of neural SDEs. We find robust bounds for prices of derivatives and the corresponding hedging strategies while incorporating relevant market data. Neural SDEs allow consistent calibration under both the risk-neutral and the real-world measures.
arXiv Detail & Related papers (2020-07-08T14:33:17Z)
CHEER: Rich Model Helps Poor Model via Knowledge Infusion [69.23072792708263]
We develop a knowledge infusion framework named CHEER that can succinctly summarize such rich model into transferable representations. Our empirical results showed that CHEER outperformed baselines by 5.60% to 46.80% in terms of the macro-F1 score on multiple physiological datasets.
arXiv Detail & Related papers (2020-05-21T21:44:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.