Biased Hypothesis Formation From Projection Pursuit
- URL: http://arxiv.org/abs/2201.00889v1
- Date: Mon, 3 Jan 2022 22:02:26 GMT
- Title: Biased Hypothesis Formation From Projection Pursuit
- Authors: John Patterson, Chris Avery, Tyler Grear, Donald J. Jacobs
- Abstract summary: The effect of bias on hypothesis formation is characterized for an automated data-driven pursuit neural network.
This intelligent exploratory process partitions a complete state space into disjoint subspaces to create working hypotheses.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The effect of bias on hypothesis formation is characterized for an automated
data-driven projection pursuit neural network to extract and select features
for binary classification of data streams. This intelligent exploratory process
partitions a complete vector state space into disjoint subspaces to create
working hypotheses quantified by similarities and differences observed between
two groups of labeled data streams. Data streams are typically time sequenced,
and may exhibit complex spatio-temporal patterns. For example, given atomic
trajectories from molecular dynamics simulation, the machine's task is to
quantify dynamical mechanisms that promote function by comparing protein
mutants, some known to function while others are nonfunctional. Utilizing
synthetic two-dimensional molecules that mimic the dynamics of functional and
nonfunctional proteins, biases are identified and controlled in both the
machine learning model and selected training data under different contexts. The
refinement of a working hypothesis converges to a statistically robust
multivariate perception of the data based on a context-dependent perspective.
Including diverse perspectives during data exploration enhances
interpretability of the multivariate characterization of similarities and
differences.
Related papers
- Detecting and Identifying Selection Structure in Sequential Data [53.24493902162797]
We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences.
We show that selection structure is identifiable without any parametric assumptions or interventional experiments.
We also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies.
arXiv Detail & Related papers (2024-06-29T20:56:34Z) - Data-Error Scaling in Machine Learning on Natural Discrete Combinatorial Mutation-prone Sets: Case Studies on Peptides and Small Molecules [0.0]
We investigate trends in the data-error scaling behavior of machine learning (ML) models trained on discrete spaces that are prone-to-mutation.
In contrast to typical data-error scaling, our results showed discontinuous monotonic phase transitions during learning.
We present an alternative strategy to normalize learning curves and the concept of mutant based shuffling.
arXiv Detail & Related papers (2024-05-08T16:04:50Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Dynamical Regimes of Diffusion Models [14.797301819675454]
We study generative diffusion models in the regime where the dimension of space and the number of data are large.
Our analysis reveals three distinct dynamical regimes during the backward generative diffusion process.
The dependence of the collapse time on the dimension and number of data provides a thorough characterization of the curse of dimensionality for diffusion models.
arXiv Detail & Related papers (2024-02-28T17:19:26Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular
data [81.43750358586072]
We propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes.
We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets.
arXiv Detail & Related papers (2022-10-24T08:57:55Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - MoReL: Multi-omics Relational Learning [26.484803417186384]
We propose a novel deep Bayesian generative model to efficiently infer a multi-partite graph encoding molecular interactions across heterogeneous views.
With such an optimal transport regularization in the deep Bayesian generative model, it not only allows incorporating view-specific side information, but also increases the model flexibility with the distribution-based regularization.
arXiv Detail & Related papers (2022-03-15T02:50:07Z) - Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via
Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion.
We propose a general framework that combines disparate data types through the exponential family of distributions.
The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z) - Model discovery in the sparse sampling regime [0.0]
We show how deep learning can improve model discovery of partial differential equations.
As a result, deep learning-based model discovery allows to recover the underlying equations.
We illustrate our claims on both synthetic and experimental sets.
arXiv Detail & Related papers (2021-05-02T06:27:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.