Has the Deep Neural Network learned the Stochastic Process? A Wildfire Perspective
- URL: http://arxiv.org/abs/2402.15163v3
- Date: Wed, 22 May 2024 18:50:10 GMT
- Title: Has the Deep Neural Network learned the Stochastic Process? A Wildfire Perspective
- Authors: Harshit Kumar, Beomseok Kang, Biswadeep Chakraborty, Saibal Mukhopadhyay,
- Abstract summary: This paper presents the first systematic study of evalution of Deep Neural Network (DNN)
We show that traditional evaluation methods assess a DNN's ability to replicate the observed ground truth (GT)
We propose a new system property: Statistic-GT, representing the GT of the process, and an evaluation metric that exclusively assesses fidelity to Statistic-GT.
- Score: 17.897121328003617
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents the first systematic study of evalution of Deep Neural Network (DNN) designed and trained to predict the evolution of a stochastic dynamical system, using wildfire prediction as a case study. We show that traditional evaluation methods based on threshold based classification metrics and error-based scoring rules assess a DNN's ability to replicate the observed ground truth (GT), but do not measure the fidelity of the DNN's learning of the underlying stochastic process. To address this gap, we propose a new system property: Statistic-GT, representing the GT of the stochastic process, and an evaluation metric that exclusively assesses fidelity to Statistic-GT. Utilizing a synthetic dataset, we introduce a stochastic framework to characterize this property and establish criteria for a metric to be a valid measure of the proposed property. We formally show that Expected Calibration Error (ECE) tests the necessary condition for fidelity to Statistic-GT. We perform empirical experiments, differentiating ECE's behavior from conventional metrics and demonstrate that ECE exclusively measures fidelity to the stochastic process. Extending our analysis to real-world wildfire data, we highlight the limitations of traditional evaluation methods and discuss the utility of evaluating fidelity to the stochastic process alongside existing metrics.
Related papers
- The Lessons of Developing Process Reward Models in Mathematical Reasoning [62.165534879284735]
Process Reward Models (PRMs) aim to identify and mitigate intermediate errors in the reasoning processes.
We develop a consensus filtering mechanism that effectively integrates Monte Carlo (MC) estimation with Large Language Models (LLMs)
We release a new state-of-the-art PRM that outperforms existing open-source alternatives.
arXiv Detail & Related papers (2025-01-13T13:10:16Z) - Average-Over-Time Spiking Neural Networks for Uncertainty Estimation in Regression [3.409728296852651]
We introduce two methods that adapt the Average-Over-Time Spiking Neural Network (AOT-SNN) framework to regression tasks.
We evaluate our approaches on both a toy dataset and several benchmark datasets.
arXiv Detail & Related papers (2024-11-29T23:13:52Z) - Quantifying calibration error in modern neural networks through evidence based theory [0.0]
This paper introduces a novel framework for quantifying the trustworthiness of neural networks by incorporating subjective logic into the evaluation of Expected Error (ECE)
We demonstrate the effectiveness of this approach through experiments on MNIST and CIFAR-10 datasets where post-calibration results indicate improved trustworthiness.
The proposed framework offers a more interpretable and nuanced assessment of AI models, with potential applications in sensitive domains such as healthcare and autonomous systems.
arXiv Detail & Related papers (2024-10-31T23:54:21Z) - A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework for Large Language Models (LLMs)
Namely, we propose novel metrics with high probability guarantees concerning the output distribution of a model.
Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z) - Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment.
To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - The Significance of Latent Data Divergence in Predicting System Degradation [1.2058600649065616]
Condition-Based Maintenance is pivotal in enabling the early detection of potential failures in engineering systems.
We introduce a novel methodology grounded in the analysis of statistical similarity within latent data from system components.
We infer the similarity between systems by evaluating the divergence of these priors, offering a nuanced understanding of individual system behaviors.
arXiv Detail & Related papers (2024-06-13T11:41:20Z) - A Bayesian Unification of Self-Supervised Clustering and Energy-Based
Models [11.007541337967027]
We perform a Bayesian analysis of state-of-the-art self-supervised learning objectives.
We show that our objective function allows to outperform existing self-supervised learning strategies.
We also demonstrate that GEDI can be integrated into a neuro-symbolic framework.
arXiv Detail & Related papers (2023-12-30T04:46:16Z) - From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks [0.0]
We reinvigorate maximum likelihood estimation (MLE) for macroeconomic density forecasting through a novel neural network architecture with dedicated mean and variance hemispheres.
Our Hemisphere Neural Network (HNN) provides proactive volatility forecasts based on leading indicators when it can, and reactive volatility based on the magnitude of previous prediction errors when it must.
arXiv Detail & Related papers (2023-11-27T21:37:50Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - Amortised Inference in Bayesian Neural Networks [0.0]
We introduce the Amortised Pseudo-Observation Variational Inference Bayesian Neural Network (APOVI-BNN)
We show that the amortised inference is of similar or better quality to those obtained through traditional variational inference.
We then discuss how the APOVI-BNN may be viewed as a new member of the neural process family.
arXiv Detail & Related papers (2023-09-06T14:02:33Z) - Expectation consistency for calibration of neural networks [24.073221004661427]
We introduce a novel calibration technique named expectation consistency (EC)
EC enforces that the average validation confidence coincides with the average proportion of correct labels.
We discuss examples where EC significantly outperforms temperature scaling.
arXiv Detail & Related papers (2023-03-05T11:21:03Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z) - Robust Deep Learning for Autonomous Driving [0.0]
We introduce a new criterion to reliably estimate model confidence: the true class probability ( TCP)
Since the true class is by essence unknown at test time, we propose to learn TCP criterion from data with an auxiliary model, introducing a specific learning scheme adapted to this context.
We tackle the challenge of jointly detecting misclassification and out-of-distributions samples by introducing a new uncertainty measure based on evidential models and defined on the simplex.
arXiv Detail & Related papers (2022-11-14T22:07:11Z) - Evaluating Disentanglement in Generative Models Without Knowledge of
Latent Factors [71.79984112148865]
We introduce a method for ranking generative models based on the training dynamics exhibited during learning.
Inspired by recent theoretical characterizations of disentanglement, our method does not require supervision of the underlying latent factors.
arXiv Detail & Related papers (2022-10-04T17:27:29Z) - New Machine Learning Techniques for Simulation-Based Inference:
InferoStatic Nets, Kernel Score Estimation, and Kernel Likelihood Ratio
Estimation [4.415977307120616]
We propose a machine-learning approach to model the score and likelihood ratio estimators in cases when the probability density can be sampled but not computed directly.
We introduce new strategies, respectively called Kernel Score Estimation (KSE) and Kernel Likelihood Ratio Estimation (KLRE) to learn the score and the likelihood ratio functions from simulated data.
arXiv Detail & Related papers (2022-10-04T15:22:56Z) - Neural Causal Models for Counterfactual Identification and Estimation [62.30444687707919]
We study the evaluation of counterfactual statements through neural models.
First, we show that neural causal models (NCMs) are expressive enough.
Second, we develop an algorithm for simultaneously identifying and estimating counterfactual distributions.
arXiv Detail & Related papers (2022-09-30T18:29:09Z) - A Unified Contrastive Energy-based Model for Understanding the
Generative Ability of Adversarial Training [64.71254710803368]
Adversarial Training (AT) is an effective approach to enhance the robustness of deep neural networks.
We demystify this phenomenon by developing a unified probabilistic framework, called Contrastive Energy-based Models (CEM)
We propose a principled method to develop adversarial learning and sampling methods.
arXiv Detail & Related papers (2022-03-25T05:33:34Z) - NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural
Networks [151.03112356092575]
We show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution.
We demonstrate the strong performance of the method in uncertainty estimation tasks on a variety of real-world image datasets.
arXiv Detail & Related papers (2022-02-07T12:30:45Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.