DagSim: Combining DAG-based model structure with unconstrained data
types and relations for flexible, transparent, and modularized data
simulation
- URL: http://arxiv.org/abs/2205.11234v1
- Date: Fri, 6 May 2022 17:43:27 GMT
- Title: DagSim: Combining DAG-based model structure with unconstrained data
types and relations for flexible, transparent, and modularized data
simulation
- Authors: Ghadi S. Al Hajj, Johan Pensar, Geir Kjetil Sandve
- Abstract summary: We present DagSim, a Python-based framework for DAG-based data simulation without any constraints on variable types or functional relations.
A succinct YAML format for defining the simulation model structure promotes transparency.
We illustrate the capabilities of DagSim through use cases where metadata variables control shapes in an image and patterns in bio-sequences.
- Score: 2.685173014586162
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Data simulation is fundamental for machine learning and causal inference, as
it allows exploration of scenarios and assessment of methods in settings with
full control of ground truth. Directed acyclic graphs (DAGs) are well
established for encoding the dependence structure over a collection of
variables in both inference and simulation settings. However, while modern
machine learning is applied to data of an increasingly complex nature,
DAG-based simulation frameworks are still confined to settings with relatively
simple variable types and functional forms. We here present DagSim, a
Python-based framework for DAG-based data simulation without any constraints on
variable types or functional relations. A succinct YAML format for defining the
simulation model structure promotes transparency, while separate user-provided
functions for generating each variable based on its parents ensure simulation
code modularization. We illustrate the capabilities of DagSim through use cases
where metadata variables control shapes in an image and patterns in
bio-sequences.
Related papers
- sbi reloaded: a toolkit for simulation-based inference workflows [15.696312591547283]
$texttsbi$ is a PyTorch-based package that implements Bayesian SBI algorithms based on neural networks.
The $texttsbi$ toolkit enables scientists and engineers to apply state-of-the-art SBI methods to black-box simulators.
arXiv Detail & Related papers (2024-11-26T11:31:47Z) - Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens.
Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z) - Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations [56.78271181959529]
Generalized Additive Models (GAMs) can capture non-linear relationships between variables and targets, but they cannot capture intricate feature interactions.
We propose Shape Expressions Arithmetic ( SHAREs) that fuses GAM's flexible shape functions with the complex feature interactions found in mathematical expressions.
We also design a set of rules for constructing SHAREs that guarantee transparency of the found expressions beyond the standard constraints.
arXiv Detail & Related papers (2024-04-15T13:44:01Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Informal Safety Guarantees for Simulated Optimizers Through
Extrapolation from Partial Simulations [0.0]
Self-supervised learning is the backbone of state of the art language modeling.
It has been argued that training with predictive loss on a self-supervised dataset causes simulators.
arXiv Detail & Related papers (2023-11-29T09:32:56Z) - Str2Str: A Score-based Framework for Zero-shot Protein Conformation
Sampling [23.74897713386661]
The dynamic nature of proteins is crucial for determining their biological functions and properties.
Existing learning-based approaches perform direct sampling yet heavily rely on target-specific simulation data for training.
We propose Str2Str, a novel structure-to-structure translation framework capable of zero-shot conformation sampling.
arXiv Detail & Related papers (2023-06-05T15:19:06Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Modular machine learning-based elastoplasticity: generalization in the
context of limited data [0.0]
We discuss a hybrid framework that can work on a variable amount of data by relying on the modularity of the elastoplasticity formulation.
The discovered material models are found to not only interpolate well but also allow for accurate extrapolation in a thermodynamically consistent manner far outside the domain of the training data.
arXiv Detail & Related papers (2022-10-15T17:35:23Z) - Enhancing Mechanical Metamodels with a Generative Model-Based Augmented
Training Dataset [0.7734726150561089]
Microstructural patterns, which play a major role in defining the mechanical behavior of tissues, are difficult to simulate.
In this work, we investigate the efficacy of machine learning-based generative models as a tool for augmenting limited input pattern datasets.
We have created an open access dataset of Finite Element Analysis simulations based on Cahn-Hilliard patterns.
arXiv Detail & Related papers (2022-03-08T16:15:54Z) - Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data
Generation [88.04759848307687]
In Meta-Sim2, we aim to learn the scene structure in addition to parameters, which is a challenging problem due to its discrete nature.
We use Reinforcement Learning to train our model, and design a feature space divergence between our synthesized and target images that is key to successful training.
We also show that this leads to downstream improvement in the performance of an object detector trained on our generated dataset as opposed to other baseline simulation methods.
arXiv Detail & Related papers (2020-08-20T17:28:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.