Learning Bayesian Networks with Heterogeneous Agronomic Data Sets via
Mixed-Effect Models and Hierarchical Clustering
- URL: http://arxiv.org/abs/2308.06399v5
- Date: Mon, 15 Jan 2024 09:07:07 GMT
- Title: Learning Bayesian Networks with Heterogeneous Agronomic Data Sets via
Mixed-Effect Models and Hierarchical Clustering
- Authors: Lorenzo Valleggi and Marco Scutari and Federico Mattia Stefanini
- Abstract summary: maize occupies 197 million hectares as of 2021 in sub-Saharan Africa, Asia, and Latin America.
Various statistical and machine learning models, including mixed-effect models, random coefficients models, random forests, and deep learning architectures, have been devised to predict maize yield.
This study introduces an innovative approach integrating random effects into Bayesian networks (BNs), leveraging their capacity to model causal and probabilistic relationships through directed acyclic graphs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Maize, a crucial crop globally cultivated across vast regions, especially in
sub-Saharan Africa, Asia, and Latin America, occupies 197 million hectares as
of 2021. Various statistical and machine learning models, including
mixed-effect models, random coefficients models, random forests, and deep
learning architectures, have been devised to predict maize yield. These models
consider factors such as genotype, environment, genotype-environment
interaction, and field management. However, the existing models often fall
short of fully exploiting the complex network of causal relationships among
these factors and the hierarchical structure inherent in agronomic data. This
study introduces an innovative approach integrating random effects into
Bayesian networks (BNs), leveraging their capacity to model causal and
probabilistic relationships through directed acyclic graphs. Rooted in the
linear mixed-effects models framework and tailored for hierarchical data, this
novel approach demonstrates enhanced BN learning. Application to a real-world
agronomic trial produces a model with improved interpretability, unveiling new
causal connections. Notably, the proposed method significantly reduces the
error rate in maize yield prediction from 28% to 17%. These results advocate
for the preference of BNs in constructing practical decision support tools for
hierarchical agronomic data, facilitating causal inference.
Related papers
- Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Innovations in Agricultural Forecasting: A Multivariate Regression Study on Global Crop Yield Prediction [0.0]
This study implements 6 regression models to predict crop yields in 37 developing countries over 27 years.
Given 4 key training parameters, insecticides (tonnes), rainfall (mm), temperature (Celsius), and yield (hg/ha), it was found that our Random Forest Regression model achieved a determination coefficient (r2) of 0.94, with a margin of error (ME) of.03.
arXiv Detail & Related papers (2023-12-04T18:45:28Z) - From Identifiable Causal Representations to Controllable Counterfactual Generation: A Survey on Causal Generative Modeling [17.074858228123706]
We focus on fundamental theory, methodology, drawbacks, datasets, and metrics.
We cover applications of causal generative models in fairness, privacy, out-of-distribution generalization, precision medicine, and biological sciences.
arXiv Detail & Related papers (2023-10-17T05:45:32Z) - Scaling Laws Do Not Scale [87.76714490248779]
We argue that as the size of datasets used to train large AI models grows, the number of distinct communities may grow.
As a result, there is an increased risk that communities represented in a dataset may have values or preferences not captured by metrics used to evaluate model performance.
arXiv Detail & Related papers (2023-07-05T15:32:21Z) - Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue
Response Generation Models by Causal Discovery [52.95935278819512]
We conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDIALOG curated in our work.
Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference of response generation model.
arXiv Detail & Related papers (2023-03-02T06:33:48Z) - Characterization and Greedy Learning of Gaussian Structural Causal
Models under Unknown Interventions [3.441021278275805]
We consider the problem of recovering the causal structure underlying observations when the targets of the interventions in each experiment are unknown.
We derive a greedy algorithm called GnIES to recover the equivalence class of the data-generating model without knowledge of the intervention targets.
We leverage this procedure and evaluate the performance of GnIES on synthetic, real, and semi-synthetic data sets.
arXiv Detail & Related papers (2022-11-27T17:37:21Z) - Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations.
We study how data heterogeneity affects the representations of the globally aggregated models.
We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z) - De-Biasing Generative Models using Counterfactual Methods [0.0]
We propose a new decoder based framework named the Causal Counterfactual Generative Model (CCGM)
Our proposed method combines a causal latent space VAE model with specific modification to emphasize causal fidelity.
We explore how better disentanglement of causal learning and encoding/decoding generates higher causal intervention quality.
arXiv Detail & Related papers (2022-07-04T16:53:20Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Predicting Census Survey Response Rates With Parsimonious Additive
Models and Structured Interactions [14.003044924094597]
We consider the problem of predicting survey response rates using a family of flexible and interpretable nonparametric models.
The study is motivated by the US Census Bureau's well-known ROAM application.
arXiv Detail & Related papers (2021-08-24T17:49:55Z) - A Twin Neural Model for Uplift [59.38563723706796]
Uplift is a particular case of conditional treatment effect modeling.
We propose a new loss function defined by leveraging a connection with the Bayesian interpretation of the relative risk.
We show our proposed method is competitive with the state-of-the-art in simulation setting and on real data from large scale randomized experiments.
arXiv Detail & Related papers (2021-05-11T16:02:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.