Generative Machine Learning for Multivariate Equity Returns
- URL: http://arxiv.org/abs/2311.14735v1
- Date: Tue, 21 Nov 2023 18:41:48 GMT
- Title: Generative Machine Learning for Multivariate Equity Returns
- Authors: Ruslan Tepelyan, Achintya Gopal
- Abstract summary: We study the efficacy of conditional importance weighted autoencoders and conditional normalizing flows for the task of modeling the returns of equities.
The main problem we work to address is modeling the joint distribution of all the members of the S&P 500, or, in other words, learning a 500-dimensional joint distribution.
We show that this generative model has a broad range of applications in finance, including generating realistic synthetic data, volatility and correlation estimation, risk analysis, and portfolio optimization.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of machine learning to generate synthetic data has grown in
popularity with the proliferation of text-to-image models and especially large
language models. The core methodology these models use is to learn the
distribution of the underlying data, similar to the classical methods common in
finance of fitting statistical models to data. In this work, we explore the
efficacy of using modern machine learning methods, specifically conditional
importance weighted autoencoders (a variant of variational autoencoders) and
conditional normalizing flows, for the task of modeling the returns of
equities. The main problem we work to address is modeling the joint
distribution of all the members of the S&P 500, or, in other words, learning a
500-dimensional joint distribution. We show that this generative model has a
broad range of applications in finance, including generating realistic
synthetic data, volatility and correlation estimation, risk analysis (e.g.,
value at risk, or VaR, of portfolios), and portfolio optimization.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - NeuralFactors: A Novel Factor Learning Approach to Generative Modeling of Equities [0.0]
We introduce NeuralFactors, a novel machine-learning based approach to factor analysis where a neural network outputs factor exposures and factor returns.
We show that this model outperforms prior approaches in terms of log-likelihood performance and computational efficiency.
arXiv Detail & Related papers (2024-08-02T18:01:09Z) - Quantifying Distribution Shifts and Uncertainties for Enhanced Model Robustness in Machine Learning Applications [0.0]
This study explores model adaptation and generalization by utilizing synthetic data.
We employ quantitative measures such as Kullback-Leibler divergence, Jensen-Shannon distance, and Mahalanobis distance to assess data similarity.
Our findings suggest that utilizing statistical measures, such as the Mahalanobis distance, to determine whether model predictions fall within the low-error "interpolation regime" or the high-error "extrapolation regime" provides a complementary method for assessing distribution shift and model uncertainty.
arXiv Detail & Related papers (2024-05-03T10:05:31Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Bayesian Active Learning for Discrete Latent Variable Models [19.852463786440122]
Active learning seeks to reduce the amount of data required to fit the parameters of a model.
latent variable models play a vital role in neuroscience, psychology, and a variety of other engineering and scientific disciplines.
arXiv Detail & Related papers (2022-02-27T19:07:12Z) - A transformer-based model for default prediction in mid-cap corporate
markets [13.535770763481905]
We study mid-cap companies with less than US $10 billion in market capitalisation.
We look to predict the default probability term structure over the medium term.
We understand which data sources contribute most to the default risk.
arXiv Detail & Related papers (2021-11-18T19:01:00Z) - Model-Based Deep Learning [155.063817656602]
Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques.
Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance.
We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches.
arXiv Detail & Related papers (2020-12-15T16:29:49Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Robust pricing and hedging via neural SDEs [0.0]
We develop and analyse novel algorithms needed for efficient use of neural SDEs.
We find robust bounds for prices of derivatives and the corresponding hedging strategies while incorporating relevant market data.
Neural SDEs allow consistent calibration under both the risk-neutral and the real-world measures.
arXiv Detail & Related papers (2020-07-08T14:33:17Z) - CHEER: Rich Model Helps Poor Model via Knowledge Infusion [69.23072792708263]
We develop a knowledge infusion framework named CHEER that can succinctly summarize such rich model into transferable representations.
Our empirical results showed that CHEER outperformed baselines by 5.60% to 46.80% in terms of the macro-F1 score on multiple physiological datasets.
arXiv Detail & Related papers (2020-05-21T21:44:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.