Related papers: Large-scale empirical validation of Bayesian Network structure learning algorithms with noisy data

Large-scale empirical validation of Bayesian Network structure learning algorithms with noisy data

URL: http://arxiv.org/abs/2005.09020v2
Date: Fri, 11 Sep 2020 13:12:00 GMT
Title: Large-scale empirical validation of Bayesian Network structure learning algorithms with noisy data
Authors: Anthony C. Constantinou, Yang Liu, Kiattikun Chobtham, Zhigao Guo and Neville K. Kitson
Abstract summary: This paper investigates the performance of 15 structure learning algorithms. Each algorithm is tested over multiple case studies, sample sizes, types of noise, and assessed with multiple evaluation criteria. Results suggest traditional synthetic performance may overestimate real-world performance by anywhere between 10% and more than 50%.
Score: 9.04391541965756
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Numerous Bayesian Network (BN) structure learning algorithms have been proposed in the literature over the past few decades. Each publication makes an empirical or theoretical case for the algorithm proposed in that publication and results across studies are often inconsistent in their claims about which algorithm is 'best'. This is partly because there is no agreed evaluation approach to determine their effectiveness. Moreover, each algorithm is based on a set of assumptions, such as complete data and causal sufficiency, and tend to be evaluated with data that conforms to these assumptions, however unrealistic these assumptions may be in the real world. As a result, it is widely accepted that synthetic performance overestimates real performance, although to what degree this may happen remains unknown. This paper investigates the performance of 15 structure learning algorithms. We propose a methodology that applies the algorithms to data that incorporates synthetic noise, in an effort to better understand the performance of structure learning algorithms when applied to real data. Each algorithm is tested over multiple case studies, sample sizes, types of noise, and assessed with multiple evaluation criteria. This work involved approximately 10,000 graphs with a total structure learning runtime of seven months. It provides the first large-scale empirical validation of BN structure learning algorithms under different assumptions of data noise. The results suggest that traditional synthetic performance may overestimate real-world performance by anywhere between 10% and more than 50%. They also show that while score-based learning is generally superior to constraint-based learning, a higher fitting score does not necessarily imply a more accurate causal graph. To facilitate comparisons with future studies, we have made all data, raw results, graphs and BN models freely available online.

Related papers

Realistic Evaluation of Deep Partial-Label Learning Algorithms [94.79036193414058]
Partial-label learning (PLL) is a weakly supervised learning problem in which each example is associated with multiple candidate labels and only one is the true label. In recent years, many deep algorithms have been developed to improve model performance. Some early developed algorithms are often underestimated and can outperform many later algorithms with complicated designs.
arXiv Detail & Related papers (2025-02-14T14:22:16Z)
Score-matching-based Structure Learning for Temporal Data on Networks [17.166362605356074]
Causal discovery is a crucial initial step in establishing causality from empirical data and background knowledge. Current score-matching-based algorithms are primarily designed to analyze independent and identically distributed (i.i.d.) data. We have developed a new parent-finding subroutine for leaf nodes in DAGs, significantly accelerating the most time-consuming part of the process: the pruning step.
arXiv Detail & Related papers (2024-12-10T12:36:35Z)
Classic algorithms are fair learners: Classification Analysis of natural weather and wildfire occurrences [0.0]
This paper reviews the empirical functioning of widely used classical supervised learning algorithms such as Decision Trees, Boosting, Support Vector Machines, k-nearest Neighbors and a shallow Artificial Neural Network.
arXiv Detail & Related papers (2023-09-04T06:11:55Z)
Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms [88.93372675846123]
We propose a task-agnostic evaluation framework Camilla for evaluating machine learning algorithms. We use cognitive diagnosis assumptions and neural networks to learn the complex interactions among algorithms, samples and the skills of each sample. In our experiments, Camilla outperforms state-of-the-art baselines on the metric reliability, rank consistency and rank stability.
arXiv Detail & Related papers (2023-07-14T03:15:56Z)
Performance Evaluation and Comparison of a New Regression Algorithm [4.125187280299247]
We compare the performance of a newly proposed regression algorithm against four conventional machine learning algorithms. The reader is free to replicate our results since we have provided the source code in a GitHub repository.
arXiv Detail & Related papers (2023-06-15T13:01:16Z)
TrueDeep: A systematic approach of crack detection with less data [0.0]
We show that by incorporating domain knowledge along with deep learning architectures, we can achieve similar performance with less data. Our algorithms, developed with 23% of the overall data, have a similar performance on the test data and significantly better performance on multiple blind datasets.
arXiv Detail & Related papers (2023-05-30T14:51:58Z)
A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper. Our dataset consists of 477 self-reported expertise scores provided by 58 researchers. For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z)
Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms. We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically. Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z)
A Framework and Benchmarking Study for Counterfactual Generating Methods on Tabular Data [0.0]
Counterfactual explanations are viewed as an effective way to explain machine learning predictions. There are already dozens of algorithms aiming to generate such explanations. benchmarking study and framework can help practitioners in determining which technique and building blocks most suit their context.
arXiv Detail & Related papers (2021-07-09T21:06:03Z)
Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning. We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class. We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
Improving Bayesian Network Structure Learning in the Presence of Measurement Error [11.103936437655575]
This paper describes an algorithm that can be added as an additional learning phase at the end of any structure learning algorithm. The proposed correction algorithm successfully improves the graphical score of four well-established structure learning algorithms.
arXiv Detail & Related papers (2020-11-19T11:27:47Z)
A Constraint-Based Algorithm for the Structural Learning of Continuous-Time Bayesian Networks [70.88503833248159]
We propose the first constraint-based algorithm for learning the structure of continuous-time Bayesian networks. We discuss the different statistical tests and the underlying hypotheses used by our proposal to establish conditional independence.
arXiv Detail & Related papers (2020-07-07T07:34:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.