Related papers: Handling Covariate Mismatch in Federated Linear Prediction

Handling Covariate Mismatch in Federated Linear Prediction

URL: http://arxiv.org/abs/2602.02083v1
Date: Mon, 02 Feb 2026 13:29:36 GMT
Title: Handling Covariate Mismatch in Federated Linear Prediction
Authors: Alexis Ayme, Rémi Khellaf,
Abstract summary: Federated learning enables institutions to train predictive models collaboratively without sharing raw data.<n>Most existing methods assume that all clients measure the same features.<n>We formalize learning a linear prediction under client-wise MCAR patterns and develop two modular approaches.
Score: 2.5782420501870296
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Federated learning enables institutions to train predictive models collaboratively without sharing raw data, addressing privacy and regulatory constraints. In the standard horizontal setting, clients hold disjoint cohorts of individuals and collaborate to learn a shared predictor. Most existing methods, however, assume that all clients measure the same features. We study the more realistic setting of covariate mismatch, where each client observes a different subset of features, which typically arises in multicenter collaborations with no prior agreement on data collection. We formalize learning a linear prediction under client-wise MCAR patterns and develop two modular approaches tailored to the dimensional regime and communication budget. In the low-dimensional setting, we propose a plug-in estimator that approximates the oracle linear predictor by aggregating sufficient statistics to estimate the covariance and cross-moment terms. In higher dimensions, we study an impute-then-regress strategy: (i) impute missing covariates using any exchangeability-preserving imputation procedure, and (ii) fit a ridge-regularized linear model on the completed data. We provide asymptotic and finite-sample learning rates for our predictors, explicitly characterizing their behaviour with the global dimension, the client-specific feature partition, and the distribution of samples across sites.

Related papers

Personalized federated prototype learning in mixed heterogeneous data scenarios [8.36422671527418]
Federated learning has received significant attention for its ability to simultaneously protect customer privacy and leverage distributed data from multiple devices for model training.<n>We propose a new approach called PFPL in mixed heterogeneous scenarios.<n>The method provides richer domain knowledge and unbiased convergence targets by constructing personalized, unbiased prototypes for each client.
arXiv Detail & Related papers (2025-10-04T08:08:32Z)
Resource-Aware Aggregation and Sparsification in Heterogeneous Ensemble Federated Learning [0.9176056742068811]
Federated learning (FL) enables distributed training with private client data.<n>Current ensemble-based FL methods fall short in capturing diversity of model predictions.<n>We propose textbfSHEFL, a global ensemble-based FL framework suited for clients with diverse computational capacities.
arXiv Detail & Related papers (2025-08-12T01:40:46Z)
Self-Interested Agents in Collaborative Machine Learning: An Incentivized Adaptive Data-Centric Framework [34.19393519060549]
We propose a framework for data-centric collaborative machine learning among self-interested agents.<n>An arbiter collects a batch of data from agents, trains a machine learning model, and provides each agent with a distinct model reflecting its data contributions.<n>This setup establishes a feedback loop where shared data influence model updates, and the resulting models guide future data-sharing policies.
arXiv Detail & Related papers (2024-12-09T15:47:36Z)
Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data. Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z)
Ensemble Modeling for Multimodal Visual Action Recognition [50.38638300332429]
We propose an ensemble modeling approach for multimodal action recognition. We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset.
arXiv Detail & Related papers (2023-08-10T08:43:20Z)
Federated Variational Inference: Towards Improved Personalization and Generalization [2.37589914835055]
We study personalization and generalization in stateless cross-device federated learning setups. We first propose a hierarchical generative model and formalize it using Bayesian Inference. We then approximate this process using Variational Inference to train our model efficiently. We evaluate our model on FEMNIST and CIFAR-100 image classification and show that FedVI beats the state-of-the-art on both tasks.
arXiv Detail & Related papers (2023-05-23T04:28:07Z)
Performative Federated Learning: A Solution to Model-Dependent and Heterogeneous Distribution Shifts [24.196279060605402]
We consider a federated learning (FL) system consisting of multiple clients and a server. Unlike the conventional FL framework that assumes the client's data is static, we consider scenarios where the clients' data distributions may be reshaped by the deployed decision model.
arXiv Detail & Related papers (2023-05-08T23:29:24Z)
Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients. FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification. Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z)
Client-specific Property Inference against Secure Aggregation in Federated Learning [52.8564467292226]
Federated learning has become a widely used paradigm for collaboratively training a common model among different participants. Many attacks have shown that it is still possible to infer sensitive information such as membership, property, or outright reconstruction of participant data. We show that simple linear models can effectively capture client-specific properties only from the aggregated model updates.
arXiv Detail & Related papers (2023-03-07T14:11:01Z)
Federated Learning with Uncertainty via Distilled Predictive Distributions [14.828509220023387]
We present a framework for federated learning with uncertainty where, in each round, each client infers the posterior distribution over its parameters as well as the posterior predictive distribution (PPD) Unlike some of the recent Bayesian approaches to federated learning, our approach does not require sending the whole posterior distribution of the parameters from each client to the server. Our approach does not make any restrictive assumptions, such as the form of the clients' posterior distributions, or of their PPDs.
arXiv Detail & Related papers (2022-06-15T14:24:59Z)
Cooperative learning for multi-view analysis [2.368995563245609]
We propose a new method for supervised learning with multiple sets of features ("views") Cooperative learning combines the usual squared error loss of predictions with an "agreement" penalty to encourage the predictions from different data views to agree. We illustrate the effectiveness of our proposed method on simulated and real data examples.
arXiv Detail & Related papers (2021-12-23T03:13:25Z)
Federated Learning with Heterogeneous Data: A Superquantile Optimization Approach [0.0]
We present a federated learning framework that is designed to robustly deliver good performance across individual clients with heterogeneous data. The proposed approach hinges upon aquantile-based learning training that captures the tail statistics of the error.
arXiv Detail & Related papers (2021-12-17T11:00:23Z)
Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points. Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters. We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z)
Toward Understanding the Influence of Individual Clients in Federated Learning [52.07734799278535]
Federated learning allows clients to jointly train a global model without sending their private data to a central server. We defined a new notion called em-Influence, quantify this influence over parameters, and proposed an effective efficient model to estimate this metric.
arXiv Detail & Related papers (2020-12-20T14:34:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.