DagSim: Combining DAG-based model structure with unconstrained data
types and relations for flexible, transparent, and modularized data
simulation
- URL: http://arxiv.org/abs/2205.11234v1
- Date: Fri, 6 May 2022 17:43:27 GMT
- Title: DagSim: Combining DAG-based model structure with unconstrained data
types and relations for flexible, transparent, and modularized data
simulation
- Authors: Ghadi S. Al Hajj, Johan Pensar, Geir Kjetil Sandve
- Abstract summary: We present DagSim, a Python-based framework for DAG-based data simulation without any constraints on variable types or functional relations.
A succinct YAML format for defining the simulation model structure promotes transparency.
We illustrate the capabilities of DagSim through use cases where metadata variables control shapes in an image and patterns in bio-sequences.
- Score: 2.685173014586162
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Data simulation is fundamental for machine learning and causal inference, as
it allows exploration of scenarios and assessment of methods in settings with
full control of ground truth. Directed acyclic graphs (DAGs) are well
established for encoding the dependence structure over a collection of
variables in both inference and simulation settings. However, while modern
machine learning is applied to data of an increasingly complex nature,
DAG-based simulation frameworks are still confined to settings with relatively
simple variable types and functional forms. We here present DagSim, a
Python-based framework for DAG-based data simulation without any constraints on
variable types or functional relations. A succinct YAML format for defining the
simulation model structure promotes transparency, while separate user-provided
functions for generating each variable based on its parents ensure simulation
code modularization. We illustrate the capabilities of DagSim through use cases
where metadata variables control shapes in an image and patterns in
bio-sequences.
Related papers
- Factor Analysis with Correlated Topic Model for Multi-Modal Data [0.0]
Multimodal factor analysis (FA) uncovers shared axes of variation underlying simple data modalities.
FA is not suited for structured data modalities, such as text or single cell sequencing data.
We introduce FACTM, a novel, multi-view and multi-structure Bayesian model that combines FA with correlated topic modeling and is optimized using variational inference.
arXiv Detail & Related papers (2025-04-26T13:02:53Z) - Model Assembly Learning with Heterogeneous Layer Weight Merging [57.8462476398611]
We introduce Model Assembly Learning (MAL), a novel paradigm for model merging.
MAL integrates parameters from diverse models in an open-ended model zoo to enhance the base model's capabilities.
arXiv Detail & Related papers (2025-03-27T16:21:53Z) - Simulation Streams: A Programming Paradigm for Controlling Large Language Models and Building Complex Systems with Generative AI [3.3126968968429407]
Simulation Streams is a programming paradigm designed to efficiently control and leverage Large Language Models (LLMs)
Our primary goal is to create a framework that harnesses the agentic abilities of LLMs while addressing their limitations in maintaining consistency.
arXiv Detail & Related papers (2025-01-30T16:38:03Z) - GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects [55.02281855589641]
GausSim is a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels.
We leverage continuum mechanics and treat each kernel as a Center of Mass System (CMS) that represents continuous piece of matter.
In addition, GausSim incorporates explicit physics constraints, such as mass and momentum conservation, ensuring interpretable results and robust, physically plausible simulations.
arXiv Detail & Related papers (2024-12-23T18:58:17Z) - sbi reloaded: a toolkit for simulation-based inference workflows [15.696312591547283]
$texttsbi$ is a PyTorch-based package that implements Bayesian SBI algorithms based on neural networks.
The $texttsbi$ toolkit enables scientists and engineers to apply state-of-the-art SBI methods to black-box simulators.
arXiv Detail & Related papers (2024-11-26T11:31:47Z) - LOCAL: Learning with Orientation Matrix to Infer Causal Structure from Time Series Data [51.47827479376251]
LOCAL is a highly efficient, easy-to-implement, and constraint-free method for recovering dynamic causal structures.
Asymptotic Causal Learning Mask (ACML) and Dynamic Graph Learning (DGPL)
Experiments on synthetic and real-world datasets demonstrate that LOCAL significantly outperforms existing methods.
arXiv Detail & Related papers (2024-10-25T10:48:41Z) - Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen [76.02070962797794]
This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data.
CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens.
Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z) - Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations [56.78271181959529]
Generalized Additive Models (GAMs) can capture non-linear relationships between variables and targets, but they cannot capture intricate feature interactions.
We propose Shape Expressions Arithmetic ( SHAREs) that fuses GAM's flexible shape functions with the complex feature interactions found in mathematical expressions.
We also design a set of rules for constructing SHAREs that guarantee transparency of the found expressions beyond the standard constraints.
arXiv Detail & Related papers (2024-04-15T13:44:01Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Informal Safety Guarantees for Simulated Optimizers Through
Extrapolation from Partial Simulations [0.0]
Self-supervised learning is the backbone of state of the art language modeling.
It has been argued that training with predictive loss on a self-supervised dataset causes simulators.
arXiv Detail & Related papers (2023-11-29T09:32:56Z) - Str2Str: A Score-based Framework for Zero-shot Protein Conformation
Sampling [23.74897713386661]
The dynamic nature of proteins is crucial for determining their biological functions and properties.
Existing learning-based approaches perform direct sampling yet heavily rely on target-specific simulation data for training.
We propose Str2Str, a novel structure-to-structure translation framework capable of zero-shot conformation sampling.
arXiv Detail & Related papers (2023-06-05T15:19:06Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Modular machine learning-based elastoplasticity: generalization in the
context of limited data [0.0]
We discuss a hybrid framework that can work on a variable amount of data by relying on the modularity of the elastoplasticity formulation.
The discovered material models are found to not only interpolate well but also allow for accurate extrapolation in a thermodynamically consistent manner far outside the domain of the training data.
arXiv Detail & Related papers (2022-10-15T17:35:23Z) - Enhancing Mechanical Metamodels with a Generative Model-Based Augmented
Training Dataset [0.7734726150561089]
Microstructural patterns, which play a major role in defining the mechanical behavior of tissues, are difficult to simulate.
In this work, we investigate the efficacy of machine learning-based generative models as a tool for augmenting limited input pattern datasets.
We have created an open access dataset of Finite Element Analysis simulations based on Cahn-Hilliard patterns.
arXiv Detail & Related papers (2022-03-08T16:15:54Z) - Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data
Generation [88.04759848307687]
In Meta-Sim2, we aim to learn the scene structure in addition to parameters, which is a challenging problem due to its discrete nature.
We use Reinforcement Learning to train our model, and design a feature space divergence between our synthesized and target images that is key to successful training.
We also show that this leads to downstream improvement in the performance of an object detector trained on our generated dataset as opposed to other baseline simulation methods.
arXiv Detail & Related papers (2020-08-20T17:28:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.