Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure
Learning Algorithms
- URL: http://arxiv.org/abs/2107.03863v4
- Date: Mon, 4 Dec 2023 13:24:31 GMT
- Title: Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure
Learning Algorithms
- Authors: Felix L. Rios, Giusi Moffa, Jack Kuipers
- Abstract summary: Probabilistic graphical models are one common approach to modelling the data generating mechanism.
We present a novel Snakemake workflow called Benchpress for producing scalable, reproducible, and platform-independent benchmarks.
We demonstrate the applicability of this workflow for learning Bayesian networks in five typical data scenarios.
- Score: 1.7188280334580197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Describing the relationship between the variables in a study domain and
modelling the data generating mechanism is a fundamental problem in many
empirical sciences. Probabilistic graphical models are one common approach to
tackle the problem. Learning the graphical structure for such models is
computationally challenging and a fervent area of current research with a
plethora of algorithms being developed. To facilitate the benchmarking of
different methods, we present a novel Snakemake workflow, called Benchpress for
producing scalable, reproducible, and platform-independent benchmarks of
structure learning algorithms for probabilistic graphical models. Benchpress is
interfaced via a simple JSON-file, which makes it accessible for all users,
while the code is designed in a fully modular fashion to enable researchers to
contribute additional methodologies. Benchpress currently provides an interface
to a large number of state-of-the-art algorithms from libraries such as
BDgraph, BiDAG, bnlearn, causal-learn, gCastle, GOBNILP, pcalg, r.blip,
scikit-learn, TETRAD, and trilearn as well as a variety of methods for data
generating models and performance evaluation. Alongside user-defined models and
randomly generated datasets, the workflow also includes a number of standard
datasets and graphical models from the literature, which may be included in a
benchmarking study. We demonstrate the applicability of this workflow for
learning Bayesian networks in five typical data scenarios. The source code and
documentation is publicly available from http://benchpressdocs.readthedocs.io.
Related papers
- Multi-View Stochastic Block Models [34.55723218769512]
We formalize a new family of models, called textitmulti-view block models that captures this setting.
For this model, we first study efficient algorithms that naively work on the union of multiple graphs.
Then, we introduce a new efficient algorithm that provably outperforms previous approaches by analyzing the structure of each graph separately.
arXiv Detail & Related papers (2024-06-07T11:45:31Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions.
By finding a mean in this embedding space, we can recover a mean graph that preserves structural information.
We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z) - CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM.
For learning methods, we explore the claim of a "free lunch" hypothesis.
For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z) - A Closer Look at Few-shot Classification Again [68.44963578735877]
Few-shot classification consists of a training phase and an adaptation phase.
We empirically prove that the training algorithm and the adaptation algorithm can be completely disentangled.
Our meta-analysis for each phase reveals several interesting insights that may help better understand key aspects of few-shot classification.
arXiv Detail & Related papers (2023-01-28T16:42:05Z) - pyGSL: A Graph Structure Learning Toolkit [14.000763778781547]
pyGSL is a Python library that provides efficient implementations of state-of-the-art graph structure learning models.
pyGSL is written in GPU-friendly ways, allowing one to scale to much larger network tasks.
arXiv Detail & Related papers (2022-11-07T14:23:10Z) - A Framework for Large Scale Synthetic Graph Dataset Generation [2.248608623448951]
This work proposes a scalable synthetic graph generation tool to scale the datasets to production-size graphs.
The tool learns a series of parametric models from proprietary datasets that can be released to researchers.
We demonstrate the generalizability of the framework across a series of datasets.
arXiv Detail & Related papers (2022-10-04T22:41:33Z) - A Graph-Enhanced Click Model for Web Search [67.27218481132185]
We propose a novel graph-enhanced click model (GraphCM) for web search.
We exploit both intra-session and inter-session information for the sparsity and cold-start problems.
arXiv Detail & Related papers (2022-06-17T08:32:43Z) - Bayesian Deep Learning for Graphs [6.497816402045099]
dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification issues.
We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion.
This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks.
arXiv Detail & Related papers (2022-02-24T20:18:41Z) - Captum: A unified and generic model interpretability library for PyTorch [49.72749684393332]
We introduce a novel, unified, open-source model interpretability library for PyTorch.
The library contains generic implementations of a number of gradient and perturbation-based attribution algorithms.
It can be used for both classification and non-classification models.
arXiv Detail & Related papers (2020-09-16T18:57:57Z) - PHOTONAI -- A Python API for Rapid Machine Learning Model Development [2.414341608751139]
PHOTONAI is a high-level Python API designed to simplify and accelerate machine learning model development.
It functions as a unifying framework allowing the user to easily access and combine algorithms from different toolboxes into custom algorithm sequences.
arXiv Detail & Related papers (2020-02-13T10:33:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.