DagSim: Combining DAG-based model structure with unconstrained data
types and relations for flexible, transparent, and modularized data
simulation
- URL: http://arxiv.org/abs/2205.11234v1
- Date: Fri, 6 May 2022 17:43:27 GMT
- Title: DagSim: Combining DAG-based model structure with unconstrained data
types and relations for flexible, transparent, and modularized data
simulation
- Authors: Ghadi S. Al Hajj, Johan Pensar, Geir Kjetil Sandve
- Abstract summary: We present DagSim, a Python-based framework for DAG-based data simulation without any constraints on variable types or functional relations.
A succinct YAML format for defining the simulation model structure promotes transparency.
We illustrate the capabilities of DagSim through use cases where metadata variables control shapes in an image and patterns in bio-sequences.
- Score: 2.685173014586162
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Data simulation is fundamental for machine learning and causal inference, as
it allows exploration of scenarios and assessment of methods in settings with
full control of ground truth. Directed acyclic graphs (DAGs) are well
established for encoding the dependence structure over a collection of
variables in both inference and simulation settings. However, while modern
machine learning is applied to data of an increasingly complex nature,
DAG-based simulation frameworks are still confined to settings with relatively
simple variable types and functional forms. We here present DagSim, a
Python-based framework for DAG-based data simulation without any constraints on
variable types or functional relations. A succinct YAML format for defining the
simulation model structure promotes transparency, while separate user-provided
functions for generating each variable based on its parents ensure simulation
code modularization. We illustrate the capabilities of DagSim through use cases
where metadata variables control shapes in an image and patterns in
bio-sequences.
Related papers
- Simulation Streams: A Programming Paradigm for Controlling Large Language Models and Building Complex Systems with Generative AI [3.3126968968429407]
Simulation Streams is a programming paradigm designed to efficiently control and leverage Large Language Models (LLMs)
Our primary goal is to create a framework that harnesses the agentic abilities of LLMs while addressing their limitations in maintaining consistency.
arXiv Detail & Related papers (2025-01-30T16:38:03Z) - GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator [55.02281855589641]
GauSim is a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels.
We leverage continuum mechanics, modeling each kernel as a continuous piece of matter to account for realistic deformations without idealized assumptions.
GauSim incorporates explicit physics constraints, such as mass and momentum conservation, ensuring interpretable results and robust, physically plausible simulations.
arXiv Detail & Related papers (2024-12-23T18:58:17Z) - sbi reloaded: a toolkit for simulation-based inference workflows [15.696312591547283]
$texttsbi$ is a PyTorch-based package that implements Bayesian SBI algorithms based on neural networks.
The $texttsbi$ toolkit enables scientists and engineers to apply state-of-the-art SBI methods to black-box simulators.
arXiv Detail & Related papers (2024-11-26T11:31:47Z) - Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens.
Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z) - Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations [56.78271181959529]
Generalized Additive Models (GAMs) can capture non-linear relationships between variables and targets, but they cannot capture intricate feature interactions.
We propose Shape Expressions Arithmetic ( SHAREs) that fuses GAM's flexible shape functions with the complex feature interactions found in mathematical expressions.
We also design a set of rules for constructing SHAREs that guarantee transparency of the found expressions beyond the standard constraints.
arXiv Detail & Related papers (2024-04-15T13:44:01Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Informal Safety Guarantees for Simulated Optimizers Through
Extrapolation from Partial Simulations [0.0]
Self-supervised learning is the backbone of state of the art language modeling.
It has been argued that training with predictive loss on a self-supervised dataset causes simulators.
arXiv Detail & Related papers (2023-11-29T09:32:56Z) - Str2Str: A Score-based Framework for Zero-shot Protein Conformation
Sampling [23.74897713386661]
The dynamic nature of proteins is crucial for determining their biological functions and properties.
Existing learning-based approaches perform direct sampling yet heavily rely on target-specific simulation data for training.
We propose Str2Str, a novel structure-to-structure translation framework capable of zero-shot conformation sampling.
arXiv Detail & Related papers (2023-06-05T15:19:06Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Enhancing Mechanical Metamodels with a Generative Model-Based Augmented
Training Dataset [0.7734726150561089]
Microstructural patterns, which play a major role in defining the mechanical behavior of tissues, are difficult to simulate.
In this work, we investigate the efficacy of machine learning-based generative models as a tool for augmenting limited input pattern datasets.
We have created an open access dataset of Finite Element Analysis simulations based on Cahn-Hilliard patterns.
arXiv Detail & Related papers (2022-03-08T16:15:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.