A Unified Variational Imputation Framework for Electric Vehicle Charging Data Using Retrieval-Augmented Language Model
- URL: http://arxiv.org/abs/2601.13476v1
- Date: Tue, 20 Jan 2026 00:17:54 GMT
- Title: A Unified Variational Imputation Framework for Electric Vehicle Charging Data Using Retrieval-Augmented Language Model
- Authors: Jinhao Li, Hao Wang,
- Abstract summary: We develop a novel PRobabilistic variational imputation framework for electric vehicle charging data.<n>PRAIM employs a pre-trained language model to encode heterogeneous data, spanning time-series demand, calendar features, and geospatial context.<n>PRAIM significantly outperforms established baselines in both imputation accuracy and its ability to preserve the original data's statistical distribution.
- Score: 3.720005287197028
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The reliability of data-driven applications in electric vehicle (EV) infrastructure, such as charging demand forecasting, hinges on the availability of complete, high-quality charging data. However, real-world EV datasets are often plagued by missing records, and existing imputation methods are ill-equipped for the complex, multimodal context of charging data, often relying on a restrictive one-model-per-station paradigm that ignores valuable inter-station correlations. To address these gaps, we develop a novel PRobabilistic variational imputation framework that leverages the power of large lAnguage models and retrIeval-augmented Memory (PRAIM). PRAIM employs a pre-trained language model to encode heterogeneous data, spanning time-series demand, calendar features, and geospatial context, into a unified, semantically rich representation. This is dynamically fortified by retrieval-augmented memory that retrieves relevant examples from the entire charging network, enabling a single, unified imputation model empowered by variational neural architecture to overcome data sparsity. Extensive experiments on four public datasets demonstrate that PRAIM significantly outperforms established baselines in both imputation accuracy and its ability to preserve the original data's statistical distribution, leading to substantial improvements in downstream forecasting performance.
Related papers
- Multi-environment Invariance Learning with Missing Data [0.0]
In this work, we establish non-asymptotic guarantees on variable selection property and $ell$ error convergence rates.<n>We evaluate the performance of the new estimator through extensive simulations and demonstrate its application using the UCI Bike Sharing dataset.
arXiv Detail & Related papers (2026-01-12T06:30:58Z) - Learning More with Less: A Generalizable, Self-Supervised Framework for Privacy-Preserving Capacity Estimation with EV Charging Data [84.37348569981307]
We propose a first-of-its-kind capacity estimation model based on self-supervised pre-training.<n>Our model consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-10-05T08:58:35Z) - Efficient Federated Learning with Heterogeneous Data and Adaptive Dropout [62.73150122809138]
Federated Learning (FL) is a promising distributed machine learning approach that enables collaborative training of a global model using multiple edge devices.<n>We propose the FedDHAD FL framework, which comes with two novel methods: Dynamic Heterogeneous model aggregation (FedDH) and Adaptive Dropout (FedAD)<n>The combination of these two methods makes FedDHAD significantly outperform state-of-the-art solutions in terms of accuracy (up to 6.7% higher), efficiency (up to 2.02 times faster), and cost (up to 15.0% smaller)
arXiv Detail & Related papers (2025-07-14T16:19:00Z) - PIGPVAE: Physics-Informed Gaussian Process Variational Autoencoders [42.8983261737774]
We propose a novel generative model that learns from limited data by incorporating physical constraints to enhance performance.<n>We extend the VAE architecture by incorporating physical models in the generative process, enabling it to capture underlying dynamics more effectively.<n>We demonstrate that PIGPVAE can produce realistic samples beyond the observed distribution, highlighting its robustness and usefulness under distribution shifts.
arXiv Detail & Related papers (2025-05-25T21:12:01Z) - CAAT-EHR: Cross-Attentional Autoregressive Transformer for Multimodal Electronic Health Record Embeddings [0.0]
We introduce CAAT-EHR, a novel architecture designed to generate task-agnostic longitudinal embeddings from raw EHR data.<n>An autoregressive decoder complements the encoder by predicting future time points data during pre-training, ensuring that the resulting embeddings maintain temporal consistency and alignment.
arXiv Detail & Related papers (2025-01-31T05:00:02Z) - AdaPRL: Adaptive Pairwise Regression Learning with Uncertainty Estimation for Universal Regression Tasks [0.0]
We propose a novel adaptive pairwise learning framework for regression tasks (AdaPRL)<n>AdaPRL leverages the relative differences between data points and with deep probabilistic models to quantify the uncertainty associated with predictions.<n> Experiments show that AdaPRL can be seamlessly integrated into recently proposed regression frameworks to gain performance improvement.
arXiv Detail & Related papers (2025-01-10T09:19:10Z) - Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting.
Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server.
We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z) - Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Robust Data-Driven Error Compensation for a Battery Model [0.0]
Today's massively collected battery data is not yet used for more accurate and reliable simulations.
A data-driven error model is introduced enhancing an existing physically motivated model.
A neural network compensates the existing dynamic error and is further limited based on a description of the underlying data.
arXiv Detail & Related papers (2020-12-31T16:11:36Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.