Learning Unified Representations from Heterogeneous Data for Robust Heart Rate Modeling
- URL: http://arxiv.org/abs/2508.21785v2
- Date: Fri, 17 Oct 2025 07:11:23 GMT
- Title: Learning Unified Representations from Heterogeneous Data for Robust Heart Rate Modeling
- Authors: Peng Yang, Zhengdong Huang, Zicheng Xie, Wentao Tian, Jingyu Liu, Lunhong Dong,
- Abstract summary: Heart rate prediction is vital for personalized health monitoring and fitness, but it frequently faces a critical challenge when deploying in real-world data.<n>We classify it in two key dimensions: source heterogeneity from fragmented device markets with varying feature sets, and user heterogeneity reflecting distinct physiological patterns across individuals and activities.<n>Existing methods either discard device-specific information, or fail to model user-specific differences, limiting their real-world performance.<n>We propose a framework that learns latent representations to both heterogeneity, enabling downstream predictors to work consistently under heterogeneous data patterns.
- Score: 5.538168530326637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Heart rate prediction is vital for personalized health monitoring and fitness, while it frequently faces a critical challenge when deploying in real-world: data heterogeneity. We classify it in two key dimensions: source heterogeneity from fragmented device markets with varying feature sets, and user heterogeneity reflecting distinct physiological patterns across individuals and activities. Existing methods either discard device-specific information, or fail to model user-specific differences, limiting their real-world performance. To address this, we propose a framework that learns latent representations agnostic to both heterogeneity, enabling downstream predictors to work consistently under heterogeneous data patterns. Specifically, we introduce a random feature dropout strategy to handle source heterogeneity, making the model robust to various feature sets. To manage user heterogeneity, we employ a time-aware attention module to capture long-term physiological traits and use a contrastive learning objective to build a discriminative representation space. To reflect the heterogeneous nature of real-world data, we created and publicly released a new benchmark dataset, ParroTao. Evaluations on both ParroTao and the public FitRec dataset show that our model significantly outperforms existing baselines by 17% and 15%, respectively. Furthermore, analysis of the learned representations demonstrates their strong discriminative power, and one downstream application task confirm the practical value of our model.
Related papers
- Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z) - Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning [77.120955854093]
We show that data diversity can be a strong predictor of generalization in language models.<n>We introduce G-Vendi, a metric that quantifies diversity via the entropy of model-induced gradients.<n>We present Prismatic Synthesis, a framework for generating diverse synthetic data.
arXiv Detail & Related papers (2025-05-26T16:05:10Z) - MixRec: Heterogeneous Graph Collaborative Filtering [21.96510707666373]
We present a graph collaborative filtering model MixRec to disentangling users' multi-behavior interaction patterns.<n>Our model achieves this by incorporating intent disentanglement and multi-behavior modeling.<n>We also introduce a novel contrastive learning paradigm that adaptively explores the advantages of self-supervised data augmentation.
arXiv Detail & Related papers (2024-12-18T13:12:36Z) - Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models [83.02797560769285]
Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data.<n>Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts.
arXiv Detail & Related papers (2024-05-26T13:11:55Z) - Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models.
Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z) - Common-Sense Bias Modeling for Classification Tasks [15.683471433842492]
We propose a novel framework to extract comprehensive biases in image datasets based on textual descriptions.<n>Our method uncovers novel model biases in multiple image benchmark datasets.<n>The discovered bias can be mitigated by simple data re-weighting to de-correlate the features.
arXiv Detail & Related papers (2024-01-24T03:56:07Z) - Fake It Till Make It: Federated Learning with Consensus-Oriented
Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG)
FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training.
Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z) - Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations.
We study how data heterogeneity affects the representations of the globally aggregated models.
We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - MoReL: Multi-omics Relational Learning [26.484803417186384]
We propose a novel deep Bayesian generative model to efficiently infer a multi-partite graph encoding molecular interactions across heterogeneous views.
With such an optimal transport regularization in the deep Bayesian generative model, it not only allows incorporating view-specific side information, but also increases the model flexibility with the distribution-based regularization.
arXiv Detail & Related papers (2022-03-15T02:50:07Z) - Variational Model Inversion Attacks [26.613251410498755]
In model inversion attacks, a malicious user attempts to recover the private dataset used to train a supervised neural network.
A successful model inversion attack should generate realistic and diverse samples that accurately describe each of the classes in the private dataset.
In this work, we provide a probabilistic interpretation of model inversion attacks, and formulate a variational objective that accounts for both diversity and accuracy.
arXiv Detail & Related papers (2022-01-26T07:39:13Z) - Variational Selective Autoencoder: Learning from Partially-Observed
Heterogeneous Data [45.23338389559936]
We propose the variational selective autoencoder (VSAE) to learn representations from partially-observed heterogeneous data.
VSAE learns the latent dependencies in heterogeneous data by modeling the joint distribution of observed data, unobserved data, and the imputation mask.
It results in a unified model for various downstream tasks including data generation and imputation.
arXiv Detail & Related papers (2021-02-25T04:39:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.