Robust Taxi Fare Prediction Under Noisy Conditions: A Comparative Study of GAT, TimesNet, and XGBoost
- URL: http://arxiv.org/abs/2507.20008v1
- Date: Sat, 26 Jul 2025 16:55:16 GMT
- Title: Robust Taxi Fare Prediction Under Noisy Conditions: A Comparative Study of GAT, TimesNet, and XGBoost
- Authors: Padmavathi Moorthy,
- Abstract summary: This study examines three machine learning models-Graph Attention Networks (GAT), XGBoost, and TimesNet.<n>We evaluate their predictive capabilities for taxi fares using a real-world dataset comprising over 55 million records.<n>The study reveals critical differences between classical and deep learning models under realistic conditions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Precise fare prediction is crucial in ride-hailing platforms and urban mobility systems. This study examines three machine learning models-Graph Attention Networks (GAT), XGBoost, and TimesNet to evaluate their predictive capabilities for taxi fares using a real-world dataset comprising over 55 million records. Both raw (noisy) and denoised versions of the dataset are analyzed to assess the impact of data quality on model performance. The study evaluated the models along multiple axes, including predictive accuracy, calibration, uncertainty estimation, out-of-distribution (OOD) robustness, and feature sensitivity. We also explore pre-processing strategies, including KNN imputation, Gaussian noise injection, and autoencoder-based denoising. The study reveals critical differences between classical and deep learning models under realistic conditions, offering practical guidelines for building robust and scalable models in urban fare prediction systems.
Related papers
- Evaluating Ensemble and Deep Learning Models for Static Malware Detection with Dimensionality Reduction Using the EMBER Dataset [0.0]
This study investigates the effectiveness of several machine learning algorithms for static malware detection using the EMBER dataset.<n>We evaluate eight classification models: LightGBM, XGBoost, CatBoost, Random Forest, Extra Trees, HistGradientBoosting, k-Nearest Neighbors (KNN), and TabNet.<n>The models are assessed on accuracy, precision, recall, F1 score, and AUC to examine both predictive performance and robustness.
arXiv Detail & Related papers (2025-07-22T18:45:10Z) - SMI: An Information-Theoretic Metric for Predicting Model Knowledge Solely from Pre-Training Signals [51.60874286674908]
We aim to predict performance in closed-book question answering (QA), a vital downstream task indicative of a model's internal knowledge.<n>We conduct large-scale retrieval and semantic analysis across the pre-training corpora of 21 publicly available and 3 custom-trained large language models.<n>Building on these foundations, we propose Size-dependent Mutual Information (SMI), an information-theoretic metric that linearly correlates pre-training data characteristics, model size, and QA accuracy.
arXiv Detail & Related papers (2025-02-06T13:23:53Z) - Predictive Analytics of Varieties of Potatoes [2.336821989135698]
We explore the application of machine learning algorithms specifically to enhance the selection process of Russet potato clones in breeding trials.
This study addresses the challenge of efficiently identifying high-yield, disease-resistant, and climate-resilient potato varieties.
arXiv Detail & Related papers (2024-04-04T00:49:05Z) - Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.<n>We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Towards a performance characteristic curve for model evaluation: an application in information diffusion prediction [2.8686437689115354]
We propose a metric based on information entropy to quantify the randomness in diffusion data.<n>We then identify a scaling pattern between the randomness and the prediction accuracy of the model.<n>The validity of the curve is tested by three prediction models in the same family.
arXiv Detail & Related papers (2023-09-18T07:32:57Z) - A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check [53.152011258252315]
We show that using phonetic and graphic information reasonably is effective for Chinese Spelling Check.
Models are sensitive to the error distribution of the test set, which reflects the shortcomings of models.
The commonly used benchmark, SIGHAN, can not reliably evaluate models' performance.
arXiv Detail & Related papers (2023-07-25T17:02:38Z) - Learning Sample Difficulty from Pre-trained Models for Reliable
Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization.
We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z) - A monitoring framework for deployed machine learning models with supply
chain examples [2.904613270228912]
We describe a framework for monitoring machine learning models; and, (2) its implementation for a big data supply chain application.
We use our implementation to study drift in model features, predictions, and performance on three real data sets.
arXiv Detail & Related papers (2022-11-11T14:31:38Z) - Analyzing Robustness of End-to-End Neural Models for Automatic Speech
Recognition [11.489161072526677]
We investigate robustness properties of pre-trained neural models for automatic speech recognition.
In this work, we perform a robustness analysis of the pre-trained neural models wav2vec2, HuBERT and DistilHuBERT on the LibriSpeech and TIMIT datasets.
arXiv Detail & Related papers (2022-08-17T20:00:54Z) - CARLA-GeAR: a Dataset Generator for a Systematic Evaluation of
Adversarial Robustness of Vision Models [61.68061613161187]
This paper presents CARLA-GeAR, a tool for the automatic generation of synthetic datasets for evaluating the robustness of neural models against physical adversarial patches.
The tool is built on the CARLA simulator, using its Python API, and allows the generation of datasets for several vision tasks in the context of autonomous driving.
The paper presents an experimental study to evaluate the performance of some defense methods against such attacks, showing how the datasets generated with CARLA-GeAR might be used in future work as a benchmark for adversarial defense in the real world.
arXiv Detail & Related papers (2022-06-09T09:17:38Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.