Stochastic Threshold Model Trees: A Tree-Based Ensemble Method for
Dealing with Extrapolation
- URL: http://arxiv.org/abs/2009.09171v1
- Date: Sat, 19 Sep 2020 05:48:01 GMT
- Title: Stochastic Threshold Model Trees: A Tree-Based Ensemble Method for
Dealing with Extrapolation
- Authors: Kohei Numata and Kenichi Tanaka
- Abstract summary: In the development of new materials, it is desirable to search for compounds with unprecedented physical properties.
We propose development Threshold Model Trees (STMT), which reflects the trend of the data, while maintaining the accuracy of conventional methods.
In the case of the real data, although there is no significant overall improvement in accuracy, there is one compound for which the prediction accuracy is notably improved.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of chemistry, there have been many attempts to predict the
properties of unknown compounds from statistical models constructed using
machine learning. In an area where many known compounds are present (the
interpolation area), an accurate model can be constructed. In contrast, data in
areas where there are no known compounds (the extrapolation area) are generally
difficult to predict. However, in the development of new materials, it is
desirable to search this extrapolation area and discover compounds with
unprecedented physical properties. In this paper, we propose Stochastic
Threshold Model Trees (STMT), an extrapolation method that reflects the trend
of the data, while maintaining the accuracy of conventional interpolation
methods. The behavior of STMT is confirmed through experiments using both
artificial and real data. In the case of the real data, although there is no
significant overall improvement in accuracy, there is one compound for which
the prediction accuracy is notably improved, suggesting that STMT reflects the
data trends in the extrapolation area. We believe that the proposed method will
contribute to more efficient searches in situations such as new material
development.
Related papers
- Extrapolative ML Models for Copolymers [1.901715290314837]
Machine learning models have been progressively used for predicting materials properties.
These models are inherently interpolative, and their efficacy for searching candidates outside a material's known range of property is unresolved.
Here, we determine the relationship between the extrapolation ability of an ML model, the size and range of its training dataset, and its learning approach.
arXiv Detail & Related papers (2024-09-15T11:02:01Z) - Emerging-properties Mapping Using Spatial Embedding Statistics: EMUSES [0.0]
EMUSES is an innovative approach to create high-dimensional embeddings that reveal latent structures within data.
By bridging the gap between predictive accuracy and interpretability, EMUSES offers researchers a powerful tool to understand the multifactorial origins of complex phenomena.
arXiv Detail & Related papers (2024-06-20T13:39:14Z) - Balancing Molecular Information and Empirical Data in the Prediction of Physico-Chemical Properties [8.649679686652648]
We propose a general method for combining molecular descriptors with representation learning.
The proposed hybrid model exploits chemical structure information using graph neural networks.
It automatically detects cases where structure-based predictions are unreliable, in which case it corrects them by representation-learning based predictions.
arXiv Detail & Related papers (2024-06-12T10:51:00Z) - Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models [71.39421638547164]
We propose to detect OOD molecules by adopting an auxiliary diffusion model-based framework, which compares similarities between input molecules and reconstructed graphs.
Due to the generative bias towards reconstructing ID training samples, the similarity scores of OOD molecules will be much lower to facilitate detection.
Our research pioneers an approach of Prototypical Graph Reconstruction for Molecular OOD Detection, dubbed as PGR-MOOD and hinges on three innovations.
arXiv Detail & Related papers (2024-04-24T03:25:53Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Exploring the Effectiveness of Dataset Synthesis: An application of
Apple Detection in Orchards [68.95806641664713]
We explore the usability of Stable Diffusion 2.1-base for generating synthetic datasets of apple trees for object detection.
We train a YOLOv5m object detection model to predict apples in a real-world apple detection dataset.
Results demonstrate that the model trained on generated data is slightly underperforming compared to a baseline model trained on real-world images.
arXiv Detail & Related papers (2023-06-20T09:46:01Z) - ALMERIA: Boosting pairwise molecular contrasts with scalable methods [0.0]
ALMERIA is a tool for estimating compound similarities and activity prediction based on pairwise molecular contrasts.
It has been implemented using scalable software and methods to exploit large volumes of data.
Experiments show state-of-the-art performance for molecular activity prediction.
arXiv Detail & Related papers (2023-04-28T16:27:06Z) - Combining Observational and Randomized Data for Estimating Heterogeneous
Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains.
Currently, most existing works rely exclusively on observational data.
We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z) - Chemical Property Prediction Under Experimental Biases [26.407895054724452]
This study focuses on mitigating bias in the experimental datasets.
We adopted two techniques from causal inference combined with graph neural networks that can represent molecular structures.
The experimental results in four possible bias scenarios indicated that the inverse propensity scoring-based method and the counter-factual regression-based method made solid improvements.
arXiv Detail & Related papers (2020-09-18T08:40:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.