Learning Bayesian Networks with Heterogeneous Agronomic Data Sets via
Mixed-Effect Models and Hierarchical Clustering
- URL: http://arxiv.org/abs/2308.06399v5
- Date: Mon, 15 Jan 2024 09:07:07 GMT
- Title: Learning Bayesian Networks with Heterogeneous Agronomic Data Sets via
Mixed-Effect Models and Hierarchical Clustering
- Authors: Lorenzo Valleggi and Marco Scutari and Federico Mattia Stefanini
- Abstract summary: maize occupies 197 million hectares as of 2021 in sub-Saharan Africa, Asia, and Latin America.
Various statistical and machine learning models, including mixed-effect models, random coefficients models, random forests, and deep learning architectures, have been devised to predict maize yield.
This study introduces an innovative approach integrating random effects into Bayesian networks (BNs), leveraging their capacity to model causal and probabilistic relationships through directed acyclic graphs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Maize, a crucial crop globally cultivated across vast regions, especially in
sub-Saharan Africa, Asia, and Latin America, occupies 197 million hectares as
of 2021. Various statistical and machine learning models, including
mixed-effect models, random coefficients models, random forests, and deep
learning architectures, have been devised to predict maize yield. These models
consider factors such as genotype, environment, genotype-environment
interaction, and field management. However, the existing models often fall
short of fully exploiting the complex network of causal relationships among
these factors and the hierarchical structure inherent in agronomic data. This
study introduces an innovative approach integrating random effects into
Bayesian networks (BNs), leveraging their capacity to model causal and
probabilistic relationships through directed acyclic graphs. Rooted in the
linear mixed-effects models framework and tailored for hierarchical data, this
novel approach demonstrates enhanced BN learning. Application to a real-world
agronomic trial produces a model with improved interpretability, unveiling new
causal connections. Notably, the proposed method significantly reduces the
error rate in maize yield prediction from 28% to 17%. These results advocate
for the preference of BNs in constructing practical decision support tools for
hierarchical agronomic data, facilitating causal inference.
Related papers
- An unified approach to link prediction in collaboration networks [0.0]
This article investigates and compares three approaches to link prediction in colaboration networks.
The ERGM is employed to capture general structural patterns within the network.
The GCN and Word2Vec+MLP models leverage deep learning techniques to learn adaptive structural representations of nodes and their relationships.
arXiv Detail & Related papers (2024-11-01T22:40:39Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Innovations in Agricultural Forecasting: A Multivariate Regression Study on Global Crop Yield Prediction [0.0]
This study implements 6 regression models to predict crop yields in 37 developing countries over 27 years.
Given 4 key training parameters, insecticides (tonnes), rainfall (mm), temperature (Celsius), and yield (hg/ha), it was found that our Random Forest Regression model achieved a determination coefficient (r2) of 0.94, with a margin of error (ME) of.03.
arXiv Detail & Related papers (2023-12-04T18:45:28Z) - From Identifiable Causal Representations to Controllable Counterfactual Generation: A Survey on Causal Generative Modeling [17.074858228123706]
We focus on fundamental theory, methodology, drawbacks, datasets, and metrics.
We cover applications of causal generative models in fairness, privacy, out-of-distribution generalization, precision medicine, and biological sciences.
arXiv Detail & Related papers (2023-10-17T05:45:32Z) - Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase.
We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output.
Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z) - Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue
Response Generation Models by Causal Discovery [52.95935278819512]
We conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDIALOG curated in our work.
Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference of response generation model.
arXiv Detail & Related papers (2023-03-02T06:33:48Z) - Characterization and Greedy Learning of Gaussian Structural Causal
Models under Unknown Interventions [3.441021278275805]
We consider the problem of recovering the causal structure underlying observations when the targets of the interventions in each experiment are unknown.
We derive a greedy algorithm called GnIES to recover the equivalence class of the data-generating model without knowledge of the intervention targets.
We leverage this procedure and evaluate the performance of GnIES on synthetic, real, and semi-synthetic data sets.
arXiv Detail & Related papers (2022-11-27T17:37:21Z) - Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations.
We study how data heterogeneity affects the representations of the globally aggregated models.
We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Predicting Census Survey Response Rates With Parsimonious Additive
Models and Structured Interactions [14.003044924094597]
We consider the problem of predicting survey response rates using a family of flexible and interpretable nonparametric models.
The study is motivated by the US Census Bureau's well-known ROAM application.
arXiv Detail & Related papers (2021-08-24T17:49:55Z) - A Twin Neural Model for Uplift [59.38563723706796]
Uplift is a particular case of conditional treatment effect modeling.
We propose a new loss function defined by leveraging a connection with the Bayesian interpretation of the relative risk.
We show our proposed method is competitive with the state-of-the-art in simulation setting and on real data from large scale randomized experiments.
arXiv Detail & Related papers (2021-05-11T16:02:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.