Statistical inference of travelers' route choice preferences with
system-level data
- URL: http://arxiv.org/abs/2204.10964v1
- Date: Sat, 23 Apr 2022 00:38:32 GMT
- Title: Statistical inference of travelers' route choice preferences with
system-level data
- Authors: Pablo Guarda, Sean Qian
- Abstract summary: We develop a methodology to estimate travelers' utility functions with multiple attributes using system-level data.
Experiments on synthetic data show that the coefficients are consistently recovered and that hypothesis tests are a reliable statistic to identify which attributes are determinants of travelers' route choices.
The methodology is also deployed at a large scale using real Fresnoworld multisource data collected during the COVID outbreak.
- Score: 4.120057972557892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional network models encapsulate travel behavior among all
origin-destination pairs based on a simplified and generic utility function.
Typically, the utility function consists of travel time solely and its
coefficients are equated to estimates obtained from stated preference data.
While this modeling strategy is reasonable, the inherent sampling bias in
individual-level data may be further amplified over network flow aggregation,
leading to inaccurate flow estimates. This data must be collected from surveys
or travel diaries, which may be labor intensive, costly and limited to a small
time period. To address these limitations, this study extends classical
bi-level formulations to estimate travelers' utility functions with multiple
attributes using system-level data. We formulate a methodology grounded on
non-linear least squares to statistically infer travelers' utility function in
the network context using traffic counts, traffic speeds, traffic incidents and
sociodemographic information, among other attributes. The analysis of the
mathematical properties of the optimization problem and of its pseudo-convexity
motivate the use of normalized gradient descent. We also develop a hypothesis
test framework to examine statistical properties of the utility function
coefficients and to perform attributes selection. Experiments on synthetic data
show that the coefficients are consistently recovered and that hypothesis tests
are a reliable statistic to identify which attributes are determinants of
travelers' route choices. Besides, a series of Monte-Carlo experiments suggest
that statistical inference is robust to noise in the Origin-Destination matrix
and in the traffic counts, and to various levels of sensor coverage. The
methodology is also deployed at a large scale using real-world multi-source
data in Fresno, CA collected before and during the COVID-19 outbreak.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Measuring Statistical Dependencies via Maximum Norm and Characteristic
Functions [0.0]
We propose a statistical dependence measure based on the maximum-norm of the difference between joint and product-marginal characteristic functions.
The proposed measure can detect arbitrary statistical dependence between two random vectors of possibly different dimensions.
We conduct experiments both with simulated and real data.
arXiv Detail & Related papers (2022-08-16T20:24:31Z) - A Data-Driven Method for Automated Data Superposition with Applications
in Soft Matter Science [0.0]
We develop a data-driven, non-parametric method for superposing experimental data with arbitrary coordinate transformations.
Our method produces interpretable data-driven models that may inform applications such as materials classification, design, and discovery.
arXiv Detail & Related papers (2022-04-20T14:58:04Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Efficient Multidimensional Functional Data Analysis Using Marginal
Product Basis Systems [2.4554686192257424]
We propose a framework for learning continuous representations from a sample of multidimensional functional data.
We show that the resulting estimation problem can be solved efficiently by the tensor decomposition.
We conclude with a real data application in neuroimaging.
arXiv Detail & Related papers (2021-07-30T16:02:15Z) - Feature Shift Detection: Localizing Which Features Have Shifted via
Conditional Distribution Tests [12.468665026043382]
In military sensor networks, users will want to detect when one or more of the sensors has been compromised.
We first define a formalization of this problem as multiple conditional distribution hypothesis tests.
For both efficiency and flexibility, we propose a test statistic based on the density model score function.
arXiv Detail & Related papers (2021-07-14T18:23:24Z) - Learning summary features of time series for likelihood free inference [93.08098361687722]
We present a data-driven strategy for automatically learning summary features from time series data.
Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
arXiv Detail & Related papers (2020-12-04T19:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.