Deep neural networks for choice analysis: Enhancing behavioral regularity with gradient regularization
- URL: http://arxiv.org/abs/2404.14701v2
- Date: Sun, 28 Jul 2024 21:22:53 GMT
- Title: Deep neural networks for choice analysis: Enhancing behavioral regularity with gradient regularization
- Authors: Siqi Feng, Rui Yao, Stephane Hess, Ricardo A. Daziano, Timothy Brathwaite, Joan Walker, Shenhao Wang,
- Abstract summary: This study proposes strong and weak behavioral regularities as novel metrics to evaluate the monotonicity of individual demand functions.
It further designs a constrained optimization framework with six gradient regularizers to enhance DNNs' behavioral regularity.
The proposed framework is applicable to other NN-based choice models such as TasteNets.
- Score: 9.07183008265166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) frequently present behaviorally irregular patterns, significantly limiting their practical potentials and theoretical validity in travel behavior modeling. This study proposes strong and weak behavioral regularities as novel metrics to evaluate the monotonicity of individual demand functions (known as the "law of demand"), and further designs a constrained optimization framework with six gradient regularizers to enhance DNNs' behavioral regularity. The proposed framework is applied to travel survey data from Chicago and London to examine the trade-off between predictive power and behavioral regularity for large vs. small sample scenarios and in-domain vs. out-of-domain generalizations. The results demonstrate that, unlike models with strong behavioral foundations such as the multinomial logit, the benchmark DNNs cannot guarantee behavioral regularity. However, gradient regularization (GR) increases DNNs' behavioral regularity by around 6 percentage points (pp) while retaining their relatively high predictive power. In the small sample scenario, GR is more effective than in the large sample scenario, simultaneously improving behavioral regularity by about 20 pp and log-likelihood by around 1.7%. Comparing with the in-domain generalization of DNNs, GR works more effectively in out-of-domain generalization: it drastically improves the behavioral regularity of poorly performing benchmark DNNs by around 65 pp, indicating the criticality of behavioral regularization for enhancing model transferability and application in forecasting. Moreover, the proposed framework is applicable to other NN-based choice models such as TasteNets. Future studies could use behavioral regularity as a metric along with log-likelihood in evaluating travel demand models, and investigate other methods to further enhance behavioral regularity when adopting complex machine learning models.
Related papers
- Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Sparsely Changing Latent States for Prediction and Planning in Partially
Observable Domains [11.371889042789219]
GateL0RD is a novel recurrent architecture that incorporates the inductive bias to maintain stable, sparsely changing latent states.
We demonstrate that GateL0RD can compete with or outperform state-of-the-art RNNs in a variety of partially observable prediction and control tasks.
arXiv Detail & Related papers (2021-10-29T17:50:44Z) - MEMO: Test Time Robustness via Adaptation and Augmentation [131.28104376280197]
We study the problem of test time robustification, i.e., using the test input to improve model robustness.
Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions.
We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable.
arXiv Detail & Related papers (2021-10-18T17:55:11Z) - Predicting Deep Neural Network Generalization with Perturbation Response
Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks.
Specifically, we introduce two new measures for accurately predicting generalization gaps.
We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z) - CASTLE: Regularization via Auxiliary Causal Graph Discovery [89.74800176981842]
We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables.
CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features.
arXiv Detail & Related papers (2020-09-28T09:49:38Z) - On Connections between Regularizations for Improving DNN Robustness [67.28077776415724]
This paper analyzes regularization terms proposed recently for improving the adversarial robustness of deep neural networks (DNNs)
We study possible connections between several effective methods, including input-gradient regularization, Jacobian regularization, curvature regularization, and a cross-Lipschitz functional.
arXiv Detail & Related papers (2020-07-04T23:43:32Z) - Infinite attention: NNGP and NTK for deep attention networks [38.55012122588628]
We identify an equivalence between wide neural networks (NNs) and Gaussian processes (GPs)
We show that unlike single-head attention, which induces non-Gaussian behaviour, multi-head attention architectures behave as GPs as the number of heads tends to infinity.
We introduce new features to the Neural Tangents library allowing applications of NNGP/NTK models, with and without attention, to variable-length sequences.
arXiv Detail & Related papers (2020-06-18T13:57:01Z) - Optimal Regularization Can Mitigate Double Descent [29.414119906479954]
We study whether the double-descent phenomenon can be avoided by using optimal regularization.
We demonstrate empirically that optimally-tuned $ell$ regularization can double descent for more general models, including neural networks.
arXiv Detail & Related papers (2020-03-04T05:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.