Related papers: Missing Data Imputation using Neural Cellular Automata

Missing Data Imputation using Neural Cellular Automata

URL: http://arxiv.org/abs/2509.00651v2
Date: Sat, 06 Sep 2025 11:04:54 GMT
Title: Missing Data Imputation using Neural Cellular Automata
Authors: Tin Luu, Binh Nguyen, Man Ngo,
Abstract summary: We propose a novel imputation method inspired by Neural Cellular Automata (NCA)<n>We show that, with some appropriate adaptations, an NCA-based model is able to address the missing data imputation problem.<n>We also provide several experiments to evidence that our model outperforms state-of-the-art methods in terms of imputation error and post-imputation performance.
Score: 4.515234652925347
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When working with tabular data, missingness is always one of the most painful problems. Throughout many years, researchers have continuously explored better and better ways to impute missing data. Recently, with the rapid development evolution in machine learning and deep learning, there is a new trend of leveraging generative models to solve the imputation task. While the imputing version of famous models such as Variational Autoencoders or Generative Adversarial Networks were investigated, prior work has overlooked Neural Cellular Automata (NCA), a powerful computational model. In this paper, we propose a novel imputation method that is inspired by NCA. We show that, with some appropriate adaptations, an NCA-based model is able to address the missing data imputation problem. We also provide several experiments to evidence that our model outperforms state-of-the-art methods in terms of imputation error and post-imputation performance.

Related papers

Not Another Imputation Method: A Transformer-based Model for Missing Values in Tabular Datasets [1.02138250640885]
"Not Another Imputation Method" (NAIM) is a transformer-based model designed to handle missing values without traditional imputation techniques.<n>NAIM's ability to avoid the necessity of imputing missing values and to effectively learn from available data relies on two main techniques.<n>We extensively evaluated NAIM on 5 publicly available datasets, demonstrating its superior performance over 6 state-of-the-art machine learning models and 5 deep learning models.
arXiv Detail & Related papers (2024-07-16T09:43:47Z)
Diffusion-Model-Assisted Supervised Learning of Generative Models for Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation. We use the score-based diffusion model to generate labeled data. Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z)
Toward Physically Plausible Data-Driven Models: A Novel Neural Network Approach to Symbolic Regression [2.7071541526963805]
This paper proposes a novel neural network-based symbolic regression method. It constructs physically plausible models based on even very small training data sets and prior knowledge about the system. We experimentally evaluate the approach on four test systems: the TurtleBot 2 mobile robot, the magnetic manipulation system, the equivalent resistance of two resistors in parallel, and the longitudinal force of the anti-lock braking system.
arXiv Detail & Related papers (2023-02-01T22:05:04Z)
Deeply-Learned Generalized Linear Models with Missing Data [6.302686933168439]
We provide a formal treatment of missing data in the context of deeply learned generalized linear models. We propose a new architecture, textitdlglm, that is able to flexibly account for both ignorable and non-ignorable patterns of missingness. We conclude with a case study of a Bank Marketing dataset from the UCI Machine Learning Repository.
arXiv Detail & Related papers (2022-07-18T20:00:13Z)
EINNs: Epidemiologically-Informed Neural Networks [75.34199997857341]
We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting. We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models.
arXiv Detail & Related papers (2022-02-21T18:59:03Z)
BERT WEAVER: Using WEight AVERaging to enable lifelong learning for transformer-based models in biomedical semantic search engines [49.75878234192369]
We present WEAVER, a simple, yet efficient post-processing method that infuses old knowledge into the new model. We show that applying WEAVER in a sequential manner results in similar word embedding distributions as doing a combined training on all data at once.
arXiv Detail & Related papers (2022-02-21T10:34:41Z)
Reducing Catastrophic Forgetting in Self Organizing Maps with Internally-Induced Generative Replay [67.50637511633212]
A lifelong learning agent is able to continually learn from potentially infinite streams of pattern sensory data. One major historic difficulty in building agents that adapt is that neural systems struggle to retain previously-acquired knowledge when learning from new samples. This problem is known as catastrophic forgetting (interference) and remains an unsolved problem in the domain of machine learning to this day.
arXiv Detail & Related papers (2021-12-09T07:11:14Z)
MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data. MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism. We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z)
Convolutional generative adversarial imputation networks for spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods. We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z)
A Modulation Layer to Increase Neural Network Robustness Against Data Quality Issues [22.62510395932645]
Data missingness and quality are common problems in machine learning, especially for high-stakes applications such as healthcare. We propose a novel neural network modification to mitigate the impacts of low quality and missing data. Our results suggest that explicitly accounting for reduced information quality with a modulating fully connected layer can enable the deployment of artificial intelligence systems in real-time applications.
arXiv Detail & Related papers (2021-07-19T01:29:16Z)
A Framework for Machine Learning of Model Error in Dynamical Systems [7.384376731453594]
We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from data. We cast the problem in both continuous- and discrete-time, for problems in which the model error is memoryless and in which it has significant memory. We find that hybrid methods substantially outperform solely data-driven approaches in terms of data hunger, demands for model complexity, and overall predictive performance.
arXiv Detail & Related papers (2021-07-14T12:47:48Z)
Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients. We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.