Related papers: Learning Bayesian Networks with Heterogeneous Agronomic Data Sets via Mixed-Effect Models and Hierarchical Clustering

Learning Bayesian Networks with Heterogeneous Agronomic Data Sets via Mixed-Effect Models and Hierarchical Clustering

URL: http://arxiv.org/abs/2308.06399v5
Date: Mon, 15 Jan 2024 09:07:07 GMT
Title: Learning Bayesian Networks with Heterogeneous Agronomic Data Sets via Mixed-Effect Models and Hierarchical Clustering
Authors: Lorenzo Valleggi and Marco Scutari and Federico Mattia Stefanini
Abstract summary: maize occupies 197 million hectares as of 2021 in sub-Saharan Africa, Asia, and Latin America. Various statistical and machine learning models, including mixed-effect models, random coefficients models, random forests, and deep learning architectures, have been devised to predict maize yield. This study introduces an innovative approach integrating random effects into Bayesian networks (BNs), leveraging their capacity to model causal and probabilistic relationships through directed acyclic graphs.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Maize, a crucial crop globally cultivated across vast regions, especially in sub-Saharan Africa, Asia, and Latin America, occupies 197 million hectares as of 2021. Various statistical and machine learning models, including mixed-effect models, random coefficients models, random forests, and deep learning architectures, have been devised to predict maize yield. These models consider factors such as genotype, environment, genotype-environment interaction, and field management. However, the existing models often fall short of fully exploiting the complex network of causal relationships among these factors and the hierarchical structure inherent in agronomic data. This study introduces an innovative approach integrating random effects into Bayesian networks (BNs), leveraging their capacity to model causal and probabilistic relationships through directed acyclic graphs. Rooted in the linear mixed-effects models framework and tailored for hierarchical data, this novel approach demonstrates enhanced BN learning. Application to a real-world agronomic trial produces a model with improved interpretability, unveiling new causal connections. Notably, the proposed method significantly reduces the error rate in maize yield prediction from 28% to 17%. These results advocate for the preference of BNs in constructing practical decision support tools for hierarchical agronomic data, facilitating causal inference.

Related papers

Deep Learning Meets Process-Based Models: A Hybrid Approach to Agricultural Challenges [3.953669132390006]
Process-based models (PBMs) and deep learning (DL) are two key approaches in agricultural modelling, each offering distinct advantages and limitations. In contrast, DL models excel at capturing complex, nonlinear patterns from large datasets but may suffer from limited interpretability, high computational demands, and overfitting in data-scarce scenarios. This study presents a systematic review of PBMs, DL models, and hybrid PBM-DL frameworks, highlighting their applications in agricultural and environmental modelling.
arXiv Detail & Related papers (2025-04-22T06:18:33Z)
Generalized Factor Neural Network Model for High-dimensional Regression [50.554377879576066]
We tackle the challenges of modeling high-dimensional data sets with latent low-dimensional structures hidden within complex, non-linear, and noisy relationships. Our approach enables a seamless integration of concepts from non-parametric regression, factor models, and neural networks for high-dimensional regression.
arXiv Detail & Related papers (2025-02-16T23:13:55Z)
An unified approach to link prediction in collaboration networks [0.0]
This article investigates and compares three approaches to link prediction in colaboration networks. The ERGM is employed to capture general structural patterns within the network. The GCN and Word2Vec+MLP models leverage deep learning techniques to learn adaptive structural representations of nodes and their relationships.
arXiv Detail & Related papers (2024-11-01T22:40:39Z)
Tree-based variational inference for Poisson log-normal models [47.82745603191512]
hierarchical trees are often used to organize entities based on proximity criteria.<n>Current count-data models do not leverage this structured information.<n>We introduce the PLN-Tree model as an extension of the PLN model for modeling hierarchical count data.
arXiv Detail & Related papers (2024-06-25T08:24:35Z)
Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models. We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z)
Innovations in Agricultural Forecasting: A Multivariate Regression Study on Global Crop Yield Prediction [0.0]
This study implements 6 regression models to predict crop yields in 37 developing countries over 27 years. Given 4 key training parameters, insecticides (tonnes), rainfall (mm), temperature (Celsius), and yield (hg/ha), it was found that our Random Forest Regression model achieved a determination coefficient (r2) of 0.94, with a margin of error (ME) of.03.
arXiv Detail & Related papers (2023-12-04T18:45:28Z)
From Identifiable Causal Representations to Controllable Counterfactual Generation: A Survey on Causal Generative Modeling [17.074858228123706]
We focus on fundamental theory, methodology, drawbacks, datasets, and metrics. We cover applications of causal generative models in fairness, privacy, out-of-distribution generalization, precision medicine, and biological sciences.
arXiv Detail & Related papers (2023-10-17T05:45:32Z)
Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase. We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output. Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z)
Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue Response Generation Models by Causal Discovery [52.95935278819512]
We conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDIALOG curated in our work. Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference of response generation model.
arXiv Detail & Related papers (2023-03-02T06:33:48Z)
Characterization and Greedy Learning of Gaussian Structural Causal Models under Unknown Interventions [3.441021278275805]
We consider the problem of recovering the causal structure underlying observations when the targets of the interventions in each experiment are unknown. We derive a greedy algorithm called GnIES to recover the equivalence class of the data-generating model without knowledge of the intervention targets. We leverage this procedure and evaluate the performance of GnIES on synthetic, real, and semi-synthetic data sets.
arXiv Detail & Related papers (2022-11-27T17:37:21Z)
Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations. We study how data heterogeneity affects the representations of the globally aggregated models. We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z)
DRFLM: Distributionally Robust Federated Learning with Inter-client Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data. We propose a general framework to solve the above two challenges simultaneously. We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z)
Predicting Census Survey Response Rates With Parsimonious Additive Models and Structured Interactions [14.003044924094597]
We consider the problem of predicting survey response rates using a family of flexible and interpretable nonparametric models. The study is motivated by the US Census Bureau's well-known ROAM application.
arXiv Detail & Related papers (2021-08-24T17:49:55Z)
A Twin Neural Model for Uplift [59.38563723706796]
Uplift is a particular case of conditional treatment effect modeling. We propose a new loss function defined by leveraging a connection with the Bayesian interpretation of the relative risk. We show our proposed method is competitive with the state-of-the-art in simulation setting and on real data from large scale randomized experiments.
arXiv Detail & Related papers (2021-05-11T16:02:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.