Has the Deep Neural Network learned the Stochastic Process? An Evaluation Viewpoint
- URL: http://arxiv.org/abs/2402.15163v4
- Date: Sat, 25 Jan 2025 07:23:25 GMT
- Title: Has the Deep Neural Network learned the Stochastic Process? An Evaluation Viewpoint
- Authors: Harshit Kumar, Beomseok Kang, Biswadeep Chakraborty, Saibal Mukhopadhyay,
- Abstract summary: This paper presents the first systematic study of evaluating Deep Neural Networks (DNNs)<n>We show that traditional evaluation methods assess a DNN's ability to replicate the observed ground truth but fail to measure the underlying process.<n>We propose a new evaluation criterion called Fidelity toGT Process (F2SP)
- Score: 17.897121328003617
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents the first systematic study of evaluating Deep Neural Networks (DNNs) designed to forecast the evolution of stochastic complex systems. We show that traditional evaluation methods like threshold-based classification metrics and error-based scoring rules assess a DNN's ability to replicate the observed ground truth but fail to measure the DNN's learning of the underlying stochastic process. To address this gap, we propose a new evaluation criterion called Fidelity to Stochastic Process (F2SP), representing the DNN's ability to predict the system property Statistic-GT--the ground truth of the stochastic process--and introduce an evaluation metric that exclusively assesses F2SP. We formalize F2SP within a stochastic framework and establish criteria for validly measuring it. We formally show that Expected Calibration Error (ECE) satisfies the necessary condition for testing F2SP, unlike traditional evaluation methods. Empirical experiments on synthetic datasets, including wildfire, host-pathogen, and stock market models, demonstrate that ECE uniquely captures F2SP. We further extend our study to real-world wildfire data, highlighting the limitations of conventional evaluation and discuss the practical utility of incorporating F2SP into model assessment. This work offers a new perspective on evaluating DNNs modeling complex systems by emphasizing the importance of capturing the underlying stochastic process.
Related papers
- Feature Ranking in Credit-Risk with Qudit-Based Networks [0.0]
In finance, predictive models must balance accuracy and interpretability.<n>We present a quantum neural network (QNN) based on a single qudit.<n>We benchmark our model on a real-world, imbalanced credit-risk dataset from Taiwan.
arXiv Detail & Related papers (2025-11-24T14:15:57Z) - Reliable and Reproducible Demographic Inference for Fairness in Face Analysis [63.46525489354455]
We propose a fully reproducible DAI pipeline that replaces conventional end-to-end training with a modular transfer learning approach.<n>We audit this pipeline across three dimensions: accuracy, fairness, and a newly introduced notion of robustness, defined via intra-identity consistency.<n>Our results show that the proposed method outperforms strong baselines, particularly on ethnicity, which is the more challenging attribute.
arXiv Detail & Related papers (2025-10-23T12:22:02Z) - NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation [58.30936615525824]
We present NAIPv2, a debiased and efficient framework for paper quality estimation.<n> NAIPv2 employs pairwise learning within domain-year groups to reduce inconsistencies in reviewer ratings.<n>It is trained on pairwise comparisons but enabling efficient pointwise prediction at deployment.
arXiv Detail & Related papers (2025-09-29T17:59:23Z) - Concolic Testing on Individual Fairness of Neural Network Models [0.754900594011865]
PyFair is a formal framework for evaluating and verifying individual fairness of Deep Neural Networks (DNNs)<n>Our key innovation is a dual network architecture that enables comprehensive fairness assessments and provides completeness guarantees for certain network types.
arXiv Detail & Related papers (2025-09-08T16:31:14Z) - The Lessons of Developing Process Reward Models in Mathematical Reasoning [62.165534879284735]
Process Reward Models (PRMs) aim to identify and mitigate intermediate errors in the reasoning processes.
We develop a consensus filtering mechanism that effectively integrates Monte Carlo (MC) estimation with Large Language Models (LLMs)
We release a new state-of-the-art PRM that outperforms existing open-source alternatives.
arXiv Detail & Related papers (2025-01-13T13:10:16Z) - Average-Over-Time Spiking Neural Networks for Uncertainty Estimation in Regression [3.409728296852651]
We introduce two methods that adapt the Average-Over-Time Spiking Neural Network (AOT-SNN) framework to regression tasks.
We evaluate our approaches on both a toy dataset and several benchmark datasets.
arXiv Detail & Related papers (2024-11-29T23:13:52Z) - Quantifying calibration error in modern neural networks through evidence based theory [0.0]
This paper introduces a novel framework for quantifying the trustworthiness of neural networks by incorporating subjective logic into the evaluation of Expected Error (ECE)
We demonstrate the effectiveness of this approach through experiments on MNIST and CIFAR-10 datasets where post-calibration results indicate improved trustworthiness.
The proposed framework offers a more interpretable and nuanced assessment of AI models, with potential applications in sensitive domains such as healthcare and autonomous systems.
arXiv Detail & Related papers (2024-10-31T23:54:21Z) - A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework for Large Language Models (LLMs)
Namely, we propose novel metrics with high probability guarantees concerning the output distribution of a model.
Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z) - Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment.
To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - The Significance of Latent Data Divergence in Predicting System Degradation [1.2058600649065616]
Condition-Based Maintenance is pivotal in enabling the early detection of potential failures in engineering systems.
We introduce a novel methodology grounded in the analysis of statistical similarity within latent data from system components.
We infer the similarity between systems by evaluating the divergence of these priors, offering a nuanced understanding of individual system behaviors.
arXiv Detail & Related papers (2024-06-13T11:41:20Z) - A Bayesian Unification of Self-Supervised Clustering and Energy-Based
Models [11.007541337967027]
We perform a Bayesian analysis of state-of-the-art self-supervised learning objectives.
We show that our objective function allows to outperform existing self-supervised learning strategies.
We also demonstrate that GEDI can be integrated into a neuro-symbolic framework.
arXiv Detail & Related papers (2023-12-30T04:46:16Z) - From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks [0.0]
We reinvigorate maximum likelihood estimation (MLE) for macroeconomic density forecasting through a novel neural network architecture with dedicated mean and variance hemispheres.
Our Hemisphere Neural Network (HNN) provides proactive volatility forecasts based on leading indicators when it can, and reactive volatility based on the magnitude of previous prediction errors when it must.
arXiv Detail & Related papers (2023-11-27T21:37:50Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - Amortised Inference in Bayesian Neural Networks [0.0]
We introduce the Amortised Pseudo-Observation Variational Inference Bayesian Neural Network (APOVI-BNN)
We show that the amortised inference is of similar or better quality to those obtained through traditional variational inference.
We then discuss how the APOVI-BNN may be viewed as a new member of the neural process family.
arXiv Detail & Related papers (2023-09-06T14:02:33Z) - Expectation consistency for calibration of neural networks [24.073221004661427]
We introduce a novel calibration technique named expectation consistency (EC)
EC enforces that the average validation confidence coincides with the average proportion of correct labels.
We discuss examples where EC significantly outperforms temperature scaling.
arXiv Detail & Related papers (2023-03-05T11:21:03Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z) - Robust Deep Learning for Autonomous Driving [0.0]
We introduce a new criterion to reliably estimate model confidence: the true class probability ( TCP)
Since the true class is by essence unknown at test time, we propose to learn TCP criterion from data with an auxiliary model, introducing a specific learning scheme adapted to this context.
We tackle the challenge of jointly detecting misclassification and out-of-distributions samples by introducing a new uncertainty measure based on evidential models and defined on the simplex.
arXiv Detail & Related papers (2022-11-14T22:07:11Z) - Evaluating Disentanglement in Generative Models Without Knowledge of
Latent Factors [71.79984112148865]
We introduce a method for ranking generative models based on the training dynamics exhibited during learning.
Inspired by recent theoretical characterizations of disentanglement, our method does not require supervision of the underlying latent factors.
arXiv Detail & Related papers (2022-10-04T17:27:29Z) - New Machine Learning Techniques for Simulation-Based Inference:
InferoStatic Nets, Kernel Score Estimation, and Kernel Likelihood Ratio
Estimation [4.415977307120616]
We propose a machine-learning approach to model the score and likelihood ratio estimators in cases when the probability density can be sampled but not computed directly.
We introduce new strategies, respectively called Kernel Score Estimation (KSE) and Kernel Likelihood Ratio Estimation (KLRE) to learn the score and the likelihood ratio functions from simulated data.
arXiv Detail & Related papers (2022-10-04T15:22:56Z) - Neural Causal Models for Counterfactual Identification and Estimation [62.30444687707919]
We study the evaluation of counterfactual statements through neural models.
First, we show that neural causal models (NCMs) are expressive enough.
Second, we develop an algorithm for simultaneously identifying and estimating counterfactual distributions.
arXiv Detail & Related papers (2022-09-30T18:29:09Z) - A Unified Contrastive Energy-based Model for Understanding the
Generative Ability of Adversarial Training [64.71254710803368]
Adversarial Training (AT) is an effective approach to enhance the robustness of deep neural networks.
We demystify this phenomenon by developing a unified probabilistic framework, called Contrastive Energy-based Models (CEM)
We propose a principled method to develop adversarial learning and sampling methods.
arXiv Detail & Related papers (2022-03-25T05:33:34Z) - NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural
Networks [151.03112356092575]
We show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution.
We demonstrate the strong performance of the method in uncertainty estimation tasks on a variety of real-world image datasets.
arXiv Detail & Related papers (2022-02-07T12:30:45Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.