Related papers: Time Series Feature Redundancy Paradox: An Empirical Study Based on Mortgage Default Prediction

Time Series Feature Redundancy Paradox: An Empirical Study Based on Mortgage Default Prediction

URL: http://arxiv.org/abs/2501.00034v1
Date: Mon, 23 Dec 2024 21:28:32 GMT
Title: Time Series Feature Redundancy Paradox: An Empirical Study Based on Mortgage Default Prediction
Authors: Chengyue Huang, Yahe Yang,
Abstract summary: Conventional wisdom suggests that longer training periods and more feature variables contribute to improved model performance.<n>This paper, focusing on mortgage default prediction, empirically discovers a phenomenon that contradicts traditional knowledge.
Score: 0.5755004576310334
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the widespread application of machine learning in financial risk management, conventional wisdom suggests that longer training periods and more feature variables contribute to improved model performance. This paper, focusing on mortgage default prediction, empirically discovers a phenomenon that contradicts traditional knowledge: in time series prediction, increased training data timespan and additional non-critical features actually lead to significant deterioration in prediction effectiveness. Using Fannie Mae's mortgage data, the study compares predictive performance across different time window lengths (2012-2022) and feature combinations, revealing that shorter time windows (such as single-year periods) paired with carefully selected key features yield superior prediction results. The experimental results indicate that extended time spans may introduce noise from historical data and outdated market patterns, while excessive non-critical features interfere with the model's learning of core default factors. This research not only challenges the traditional "more is better" approach in data modeling but also provides new insights and practical guidance for feature selection and time window optimization in financial risk prediction.

Related papers

Deconfounding Time Series Forecasting [1.5967186772129907]
Time series forecasting is a critical task in various domains, where accurate predictions can drive informed decision-making. Traditional forecasting methods often rely on current observations of variables to predict future outcomes. We propose an enhanced forecasting approach that incorporates representations of latent confounders derived from historical data.
arXiv Detail & Related papers (2024-10-27T12:45:42Z)
A Spatio-Temporal Machine Learning Model for Mortgage Credit Risk: Default Probabilities and Loan Portfolios [11.141688859736805]
We introduce a machine learning model for credit risk by combining tree-boosting with a latent-temporal- Gaussian process model accounting for frailty correlation. We find that both predictive default probabilities for individual predictive loan portfolio loss distributions are more accurate compared to conventional independent linear hazard models.
arXiv Detail & Related papers (2024-10-03T15:10:55Z)
Learning Graph Structures and Uncertainty for Accurate and Calibrated Time-series Forecasting [65.40983982856056]
We introduce STOIC, that leverages correlations between time-series to learn underlying structure between time-series and to provide well-calibrated and accurate forecasts. Over a wide-range of benchmark datasets STOIC provides 16% more accurate and better-calibrated forecasts.
arXiv Detail & Related papers (2024-07-02T20:14:32Z)
Rating Multi-Modal Time-Series Forecasting Models (MM-TSFM) for Robustness Through a Causal Lens [10.103561529332184]
We focus on multi-modal time-series forecasting, where imprecision due to noisy or incorrect data can lead to erroneous predictions. We introduce a rating methodology to assess the robustness of Multi-Modal Time-Series Forecasting Models.
arXiv Detail & Related papers (2024-06-12T17:39:16Z)
Enhancing Mean-Reverting Time Series Prediction with Gaussian Processes: Functional and Augmented Data Structures in Financial Forecasting [0.0]
We explore the application of Gaussian Processes (GPs) for predicting mean-reverting time series with an underlying structure. GPs offer the potential to forecast not just the average prediction but the entire probability distribution over a future trajectory. This is particularly beneficial in financial contexts, where accurate predictions alone may not suffice if incorrect volatility assessments lead to capital losses.
arXiv Detail & Related papers (2024-02-23T06:09:45Z)
Loss Shaping Constraints for Long-Term Time Series Forecasting [79.3533114027664]
We present a Constrained Learning approach for long-term time series forecasting that respects a user-defined upper bound on the loss at each time-step. We propose a practical Primal-Dual algorithm to tackle it, and aims to demonstrate that it exhibits competitive average performance in time series benchmarks, while shaping the errors across the predicted window.
arXiv Detail & Related papers (2024-02-14T18:20:44Z)
Performative Time-Series Forecasting [71.18553214204978]
We formalize performative time-series forecasting (PeTS) from a machine-learning perspective. We propose a novel approach, Feature Performative-Shifting (FPS), which leverages the concept of delayed response to anticipate distribution shifts. We conduct comprehensive experiments using multiple time-series models on COVID-19 and traffic forecasting tasks.
arXiv Detail & Related papers (2023-10-09T18:34:29Z)
Data Scaling Effect of Deep Learning in Financial Time Series Forecasting [5.299784478982814]
This study highlights the importance of global training, where the deep learning model is optimized across a wide spectrum of stocks. We show that a globally trained deep learning model is capable of delivering accurate zero-shot forecasts for any stocks.
arXiv Detail & Related papers (2023-09-05T09:18:45Z)
DeepVol: Volatility Forecasting from High-Frequency Data with Dilated Causal Convolutions [53.37679435230207]
We propose DeepVol, a model based on Dilated Causal Convolutions that uses high-frequency data to forecast day-ahead volatility. Our empirical results suggest that the proposed deep learning-based approach effectively learns global features from high-frequency data.
arXiv Detail & Related papers (2022-09-23T16:13:47Z)
Uncertainty-Aware Time-to-Event Prediction using Deep Kernel Accelerated Failure Time Models [11.171712535005357]
We propose Deep Kernel Accelerated Failure Time models for the time-to-event prediction task. Our model shows better point estimate performance than recurrent neural network based baselines in experiments on two real-world datasets.
arXiv Detail & Related papers (2021-07-26T14:55:02Z)
Low-Rank Temporal Attention-Augmented Bilinear Network for financial time-series forecasting [93.73198973454944]
Deep learning models have led to significant performance improvements in many problems coming from different domains, including prediction problems of financial time-series data. The Temporal Attention-Augmented Bilinear network was recently proposed as an efficient and high-performing model for Limit Order Book time-series forecasting. In this paper, we propose a low-rank tensor approximation of the model to further reduce the number of trainable parameters and increase its speed.
arXiv Detail & Related papers (2021-07-05T10:15:23Z)
Back2Future: Leveraging Backfill Dynamics for Improving Real-time Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task. 'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature. We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.