Related papers: A Deep Learning Model for Heterogeneous Dataset Analysis -- Application to Winter Wheat Crop Yield Prediction

A Deep Learning Model for Heterogeneous Dataset Analysis -- Application to Winter Wheat Crop Yield Prediction

URL: http://arxiv.org/abs/2306.11942v1
Date: Tue, 20 Jun 2023 23:39:06 GMT
Title: A Deep Learning Model for Heterogeneous Dataset Analysis -- Application to Winter Wheat Crop Yield Prediction
Authors: Yogesh Bansal, David Lillis, Mohand Tahar Kechadi
Abstract summary: Time-series deep learning models, such as Long Short Term Memory (LSTM), have already been explored and applied to yield prediction. The existing LSTM cannot handle heterogeneous datasets. We propose an efficient deep learning model that can deal with heterogeneous datasets.
Score: 0.6595290783361958
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Western countries rely heavily on wheat, and yield prediction is crucial. Time-series deep learning models, such as Long Short Term Memory (LSTM), have already been explored and applied to yield prediction. Existing literature reported that they perform better than traditional Machine Learning (ML) models. However, the existing LSTM cannot handle heterogeneous datasets (a combination of data which varies and remains static with time). In this paper, we propose an efficient deep learning model that can deal with heterogeneous datasets. We developed the system architecture and applied it to the real-world dataset in the digital agriculture area. We showed that it outperforms the existing ML models.

Related papers

Tessellated Linear Model for Age Prediction from Voice [29.0093388690853]
Tessellated Linear Model (TLM) is a piecewise linear approach that combines the simplicity of linear models with the capacity of non-linear functions. We evaluate TLM on the TIMIT dataset on the task of age prediction from voice, where it outperformed state-of-the-art deep learning models.
arXiv Detail & Related papers (2025-01-16T01:28:45Z)
Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Large Language Model (LLM) pretraining traditionally relies on autoregressive language modeling on randomly sampled data blocks from web-scale datasets. We take inspiration from human learning techniques like spaced repetition to hypothesize that random data sampling for LLMs leads to high training cost and low quality models which tend to forget data. In order to effectively commit web-scale information to long-term memory, we propose the LFR (Learn, Focus, and Review) pedagogy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z)
LAESI: Leaf Area Estimation with Synthetic Imagery [13.145253458335464]
We introduce LAESI, a Synthetic Leaf dataset of 100,000 synthetic leaf images on millimeter paper. This dataset provides a resource for leaf morphology analysis aimed at beech and oak leaves. We evaluate the applicability of the dataset by training machine learning models for leaf surface area prediction and semantic segmentation.
arXiv Detail & Related papers (2024-03-31T07:56:07Z)
Modified CycleGAN for the synthesization of samples for wheat head segmentation [0.09999629695552192]
In the absence of an annotated dataset, synthetic data can be used for model development. We develop a realistic annotated synthetic dataset for wheat head segmentation. The resulting model achieved a Dice score of 83.4% on an internal dataset and 83.6% on two external Global Wheat Head Detection datasets.
arXiv Detail & Related papers (2024-02-23T06:42:58Z)
Simulation-Enhanced Data Augmentation for Machine Learning Pathloss Prediction [9.664420734674088]
This paper introduces a novel simulation-enhanced data augmentation method for machine learning pathloss prediction. Our method integrates synthetic data generated from a cellular coverage simulator and independently collected real-world datasets. The integration of synthetic data significantly improves the generalizability of the model in different environments.
arXiv Detail & Related papers (2024-02-03T00:38:08Z)
Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning. Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset. We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU) We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z)
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs. We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting. Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z)
Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data. Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z)
Impact of Dataset on Acoustic Models for Automatic Speech Recognition [0.0]
In Automatic Speech Recognition, GMM-HMM had been widely used for acoustic modelling. The GMM models are widely used to create the alignments of the training data for the hybrid deep neural network model. This work aims to investigate the impact of dataset size variations on the performance of various GMM-HMM Acoustic Models.
arXiv Detail & Related papers (2022-03-25T11:41:49Z)
VAE-LIME: Deep Generative Model Based Approach for Local Data-Driven Model Interpretability Applied to the Ironmaking Industry [70.10343492784465]
It is necessary to expose to the process engineer, not solely the model predictions, but also their interpretability. Model-agnostic local interpretability solutions based on LIME have recently emerged to improve the original method. We present in this paper a novel approach, VAE-LIME, for local interpretability of data-driven models forecasting the temperature of the hot metal produced by a blast furnace.
arXiv Detail & Related papers (2020-07-15T07:07:07Z)
Data from Model: Extracting Data from Non-robust and Robust Models [83.60161052867534]
This work explores the reverse process of generating data from a model, attempting to reveal the relationship between the data and the model. We repeat the process of Data to Model (DtM) and Data from Model (DfM) in sequence and explore the loss of feature mapping information. Our results show that the accuracy drop is limited even after multiple sequences of DtM and DfM, especially for robust models.
arXiv Detail & Related papers (2020-07-13T05:27:48Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
Forecasting Industrial Aging Processes with Machine Learning Methods [0.0]
We evaluate a wider range of data-driven models, comparing some traditional stateless models to more complex recurrent neural networks. Our results show that recurrent models produce near perfect predictions when trained on larger datasets.
arXiv Detail & Related papers (2020-02-05T13:06:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.