Alternative Loss Function in Evaluation of Transformer Models
- URL: http://arxiv.org/abs/2507.16548v2
- Date: Thu, 24 Jul 2025 09:56:46 GMT
- Title: Alternative Loss Function in Evaluation of Transformer Models
- Authors: Jakub Michańków, Paweł Sakowski, Robert Ślepaczuk,
- Abstract summary: Mean Absolute Directional Loss function is more adequate for optimizing forecast-generating models.<n>We show that Transformer results are significantly better than those obtained with LSTM.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The proper design and architecture of testing machine learning models, especially in their application to quantitative finance problems, is crucial. The most important aspect of this process is selecting an adequate loss function for training, validation, estimation purposes, and hyperparameter tuning. Therefore, in this research, through empirical experiments on equity and cryptocurrency assets, we apply the Mean Absolute Directional Loss (MADL) function, which is more adequate for optimizing forecast-generating models used in algorithmic investment strategies. The MADL function results are compared between Transformer and LSTM models, and we show that in almost every case, Transformer results are significantly better than those obtained with LSTM.
Related papers
- Pruning-Based TinyML Optimization of Machine Learning Models for Anomaly Detection in Electric Vehicle Charging Infrastructure [8.29566258132752]
This paper investigates a pruning method for anomaly detection in resource-constrained environments, specifically targeting EVCI.<n> optimized models achieved significant reductions in model size and inference times, with only a marginal impact on their performance.<n> Notably, our findings indicate that, in the context of EVCI, pruning and FS can enhance computational efficiency while retaining critical anomaly detection capabilities.
arXiv Detail & Related papers (2025-03-19T00:18:37Z) - DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.<n>We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.<n>Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z) - Scaling Laws for Predicting Downstream Performance in LLMs [75.28559015477137]
This work focuses on the pre-training loss as a more computation-efficient metric for performance estimation.<n>We present FLP-M, a fundamental approach for performance prediction that addresses the practical need to integrate datasets from multiple sources during pre-training.
arXiv Detail & Related papers (2024-10-11T04:57:48Z) - Impact of Missing Values in Machine Learning: A Comprehensive Analysis [0.0]
This paper aims to examine the nuanced impact of missing values on machine learning (ML) models.
Our analysis focuses on the challenges posed by missing values, including biased inferences, reduced predictive power, and increased computational burdens.
The study employs case studies and real-world examples to illustrate the practical implications of addressing missing values.
arXiv Detail & Related papers (2024-10-10T18:31:44Z) - Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness [0.0]
We investigate the trade-off between efficiency, performance, and adversarial robustness of Large Language Models (LLMs)
We conduct experiments on three prominent models with varying levels of complexity and efficiency -- Transformer++, Gated Linear Attention (GLA) Transformer, and MatMul-Free LM.
Our results show that while the GLA Transformer and MatMul-Free LM achieve slightly lower accuracy on GLUE tasks, they demonstrate higher efficiency and either superior or comparative robustness on AdvGLUE tasks.
arXiv Detail & Related papers (2024-08-08T16:54:40Z) - Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Advancing Financial Risk Prediction Through Optimized LSTM Model Performance and Comparative Analysis [12.575399233846092]
This paper focuses on the application and optimization of LSTM model in financial risk prediction.
The optimized LSTM model shows significant advantages in AUC index compared with random forest, BP neural network and XGBoost.
arXiv Detail & Related papers (2024-05-31T03:31:17Z) - Mean Absolute Directional Loss as a New Loss Function for Machine
Learning Problems in Algorithmic Investment Strategies [0.0]
This paper investigates the issue of an adequate loss function in the optimization of machine learning models used in the forecasting of financial time series.
We propose the Mean Absolute Directional Loss function, solving important problems of classical forecast error functions in extracting information from forecasts to create efficient buy/sell signals.
arXiv Detail & Related papers (2023-09-19T11:52:13Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Relational Reasoning via Set Transformers: Provable Efficiency and
Applications to MARL [154.13105285663656]
A cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications.
Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works.
We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents.
arXiv Detail & Related papers (2022-09-20T16:42:59Z) - Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models.
We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.