Learning Shortest Paths When Data is Scarce
- URL: http://arxiv.org/abs/2601.03629v1
- Date: Wed, 07 Jan 2026 06:19:04 GMT
- Title: Learning Shortest Paths When Data is Scarce
- Authors: Dmytro Matsypura, Yu Pan, Hanzhao Wang,
- Abstract summary: We study a shortest-path problem in which a planner has access to abundant synthetic samples, limited real-world observations, and an edge-similarity capturing expected behavioral similarity across links.<n>We model the simulator-to-reality discrepancy as an unknown, edge-specific bias that varies smoothly over the similarity graph, and estimate it using Laplacian-regularized least squares.<n>For cold-start settings without initial real data, we develop a bias-aware active learning algorithm that adaptively selects edges to measure until a prescribed accuracy is met.
- Score: 3.3012620893449465
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Digital twins and other simulators are increasingly used to support routing decisions in large-scale networks. However, simulator outputs often exhibit systematic bias, while ground-truth measurements are costly and scarce. We study a stochastic shortest-path problem in which a planner has access to abundant synthetic samples, limited real-world observations, and an edge-similarity structure capturing expected behavioral similarity across links. We model the simulator-to-reality discrepancy as an unknown, edge-specific bias that varies smoothly over the similarity graph, and estimate it using Laplacian-regularized least squares. This approach yields calibrated edge cost estimates even in data-scarce regimes. We establish finite-sample error bounds, translate estimation error into path-level suboptimality guarantees, and propose a computable, data-driven certificate that verifies near-optimality of a candidate route. For cold-start settings without initial real data, we develop a bias-aware active learning algorithm that leverages the simulator and adaptively selects edges to measure until a prescribed accuracy is met. Numerical experiments on multiple road networks and traffic graphs further demonstrate the effectiveness of our methods.
Related papers
- Optimal training-conditional regret for online conformal prediction [20.643619398558315]
We study online conformal prediction for non-stationary data streams subject to unknown distribution drift.<n>We specifically focus on independently generated data with two types of distribution shift: abrupt change points and smooth drift.<n>We establish non-asymptotic regret guarantees for our online full conformal algorithm, which match the minimax lower bound under appropriate restrictions on the prediction sets.
arXiv Detail & Related papers (2026-02-18T15:31:15Z) - Uncertainty-Aware Trajectory Prediction via Rule-Regularized Heteroscedastic Deep Classification [3.126303871979975]
SHIFT (Spectral Heteroscedastic Informed Forecasting for Trajectories) is a novel framework that combines well-calibrated uncertainty modeling with informative priors.<n>Our model excels in complex scenarios, such as intersections, where uncertainty is inherently higher.
arXiv Detail & Related papers (2025-04-17T17:24:50Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Unsupervised Model Drift Estimation with Batch Normalization Statistics
for Dataset Shift Detection and Model Selection [0.0]
We propose a novel method of model drift estimation by exploiting statistics of batch normalization layer on unlabeled test data.
We show the effectiveness of our method not only on dataset shift detection but also on model selection when there are multiple candidate models among model zoo or training trajectories in an unsupervised way.
arXiv Detail & Related papers (2021-07-01T03:04:47Z) - Approximate Bayesian Computation with Path Signatures [0.5156484100374059]
We introduce the use of path signatures as a natural candidate feature set for constructing distances between time series data.
Our experiments show that such an approach can generate more accurate approximate Bayesian posteriors than existing techniques for time series models.
arXiv Detail & Related papers (2021-06-23T17:25:43Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - AutoSimulate: (Quickly) Learning Synthetic Data Generation [70.82315853981838]
We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective.
We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
arXiv Detail & Related papers (2020-08-16T11:36:11Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.