BenchML: an extensible pipelining framework for benchmarking
representations of materials and molecules at scale
- URL: http://arxiv.org/abs/2112.02287v1
- Date: Sat, 4 Dec 2021 09:07:16 GMT
- Title: BenchML: an extensible pipelining framework for benchmarking
representations of materials and molecules at scale
- Authors: Carl Poelking, Felix A. Faber, Bingqing Cheng
- Abstract summary: We introduce a machine-learning framework for benchmarking representations of chemical systems against datasets of materials and molecules.
The guiding principle is to evaluate raw descriptor performance by limiting model complexity to simple regression schemes.
The resulting models are intended as baselines that can inform future method development.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a machine-learning (ML) framework for high-throughput
benchmarking of diverse representations of chemical systems against datasets of
materials and molecules. The guiding principle underlying the benchmarking
approach is to evaluate raw descriptor performance by limiting model complexity
to simple regression schemes while enforcing best ML practices, allowing for
unbiased hyperparameter optimization, and assessing learning progress through
learning curves along series of synchronized train-test splits. The resulting
models are intended as baselines that can inform future method development,
next to indicating how easily a given dataset can be learnt. Through a
comparative analysis of the training outcome across a diverse set of
physicochemical, topological and geometric representations, we glean insight
into the relative merits of these representations as well as their
interrelatedness.
Related papers
- Analyzing Generative Models by Manifold Entropic Metrics [8.477943884416023]
We introduce a novel set of tractable information-theoretic evaluation metrics.
We compare various normalizing flow architectures and $beta$-VAEs on the EMNIST dataset.
The most interesting finding of our experiments is a ranking of model architectures and training procedures in terms of their inductive bias to converge to aligned and disentangled representations during training.
arXiv Detail & Related papers (2024-10-25T09:35:00Z) - Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective [60.64922606733441]
We introduce a mathematical model that formalizes relational learning as hypergraph recovery to study pre-training of Foundation Models (FMs)
In our framework, the world is represented as a hypergraph, with data abstracted as random samples from hyperedges. We theoretically examine the feasibility of a Pre-Trained Model (PTM) to recover this hypergraph and analyze the data efficiency in a minimax near-optimal style.
arXiv Detail & Related papers (2024-06-17T06:20:39Z) - Methods for Class-Imbalanced Learning with Support Vector Machines: A Review and an Empirical Evaluation [22.12895887111828]
We introduce a hierarchical categorization of SVM-based models with respect to class-imbalanced learning.
We compare the performances of various representative SVM-based models in each category using benchmark imbalanced data sets.
Our findings reveal that while algorithmic methods are less time-consuming owing to no data pre-processing requirements, fusion methods, which combine both re-sampling and algorithmic approaches, generally perform the best.
arXiv Detail & Related papers (2024-06-05T15:55:08Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - SymbolicAI: A framework for logic-based approaches combining generative models and solvers [9.841285581456722]
We introduce SymbolicAI, a versatile and modular framework employing a logic-based approach to concept learning and flow management in generative processes.
We treat large language models (LLMs) as semantic solvers that execute tasks based on both natural and formal language instructions.
arXiv Detail & Related papers (2024-02-01T18:50:50Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - On Training Implicit Meta-Learning With Applications to Inductive
Weighing in Consistency Regularization [0.0]
Implicit meta-learning (IML) requires computing $2nd$ order gradients, particularly the Hessian.
Various approximations for the Hessian were proposed but a systematic comparison of their compute cost, stability, generalization of solution found and estimation accuracy were largely overlooked.
We show how training a "Confidence Network" to extract domain specific features can learn to up-weigh useful images and down-weigh out-of-distribution samples.
arXiv Detail & Related papers (2023-10-28T15:50:03Z) - MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood
Inference from Sampled Trajectories [61.3299263929289]
Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice.
One class of methods uses data simulated with different parameters to infer an amortized estimator for the likelihood-to-evidence ratio.
We show that this approach can be formulated in terms of mutual information between model parameters and simulated data.
arXiv Detail & Related papers (2021-06-03T12:59:16Z) - Model-data-driven constitutive responses: application to a multiscale
computational framework [0.0]
A hybrid methodology is presented which combines classical laws (model-based), a data-driven correction component, and computational multiscale approaches.
A model-based material representation is locally improved with data from lower scales obtained by means of a nonlinear numerical homogenization procedure.
In the proposed approach, both model and data play a fundamental role allowing for the synergistic integration between a physics-based response and a machine learning black-box.
arXiv Detail & Related papers (2021-04-06T16:34:46Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Meta-learning framework with applications to zero-shot time-series
forecasting [82.61728230984099]
This work provides positive evidence using a broad meta-learning framework.
residual connections act as a meta-learning adaptation mechanism.
We show that it is viable to train a neural network on a source TS dataset and deploy it on a different target TS dataset without retraining.
arXiv Detail & Related papers (2020-02-07T16:39:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.