Related papers: Learning with Shared Representations: Statistical Rates and Efficient Algorithms

Learning with Shared Representations: Statistical Rates and Efficient Algorithms

URL: http://arxiv.org/abs/2409.04919v2
Date: Tue, 21 Jan 2025 20:03:17 GMT
Title: Learning with Shared Representations: Statistical Rates and Efficient Algorithms
Authors: Xiaochun Niu, Lili Su, Jiaming Xu, Pengkun Yang,
Abstract summary: Collaborative learning through latent shared representations enables heterogeneous clients to train personalized models with enhanced performance while reducing sample size.<n>Despite its empirical success and extensive research, the theoretical understanding of statistical error rates remains incomplete, even for shared representations constrained to low-dimensional linear subspaces.
Score: 13.643155483461028
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Collaborative learning through latent shared feature representations enables heterogeneous clients to train personalized models with enhanced performance while reducing sample complexity. Despite its empirical success and extensive research, the theoretical understanding of statistical error rates remains incomplete, even for shared representations constrained to low-dimensional linear subspaces. In this paper, we establish new upper and lower bounds on the error for learning low-dimensional linear representations shared across clients. Our results account for both statistical heterogeneity (including covariate and concept shifts) and heterogeneity in local dataset sizes, a critical aspect often overlooked in previous studies. We further extend our error bounds to more general nonlinear models, including logistic regression and one-hidden-layer ReLU neural networks. More specifically, we design a spectral estimator that leverages independent replicas of local averaging to approximately solve the non-convex least squares problem. We derive a nearly matching minimax lower bound, proving that our estimator achieves the optimal statistical rate when the latent shared linear representation is well-represented across the entire dataset--that is, when no specific direction is disproportionately underrepresented. Our analysis reveals two distinct phases of the optimal rate: in typical cases, the rate matches the standard parameter-counting rate for the representation; however, a statistical penalty arises when the number of clients surpasses a certain threshold or the local dataset sizes fall below a threshold. These findings provide a more precise characterization of when collaboration benefits the overall system or individual clients in transfer learning and private fine-tuning.

Related papers

Learning a Class of Mixed Linear Regressions: Global Convergence under General Data Conditions [1.9295130374196499]
Mixed linear regression (MLR) has attracted increasing attention because of its great theoretical and practical importance in nonlinear relationships by utilizing a mixture of linear regression sub-models. Although considerable efforts have been devoted to the learning problem of such systems, most existing investigations impose the strict independent and identically distributed (i.i.d.) or distributed PE conditions.
arXiv Detail & Related papers (2025-03-24T09:57:39Z)
Heterogeneity Matters even More in Distributed Learning: Study from Generalization Perspective [14.480713752871523]
(K) clients have each (n) training samples generated independently according to a possibly different data distribution. We study the effect of discrepancy between the clients' data distributions on the generalization error of the aggregated model. It is shown that the bound gets smaller as the degree of data heterogeneity across clients gets higher.
arXiv Detail & Related papers (2025-03-03T14:33:38Z)
High-dimensional logistic regression with missing data: Imputation, regularization, and universality [7.167672851569787]
We study high-dimensional, ridge-regularized logistic regression. We provide exact characterizations of both the prediction error and the estimation error.
arXiv Detail & Related papers (2024-10-01T21:41:21Z)
Fine-Tuning Personalization in Federated Learning to Mitigate Adversarial Clients [8.773068878015856]
Federated learning (FL) is an appealing paradigm that allows a group of machines (a.k.a. clients) to learn collectively while keeping their data local. We consider an FL setting where some clients can be adversarial, and we derive conditions under which full collaboration fails.
arXiv Detail & Related papers (2024-09-30T14:31:19Z)
Generalization error of min-norm interpolators in transfer learning [2.7309692684728617]
Min-norm interpolators emerge naturally as implicit regularized limits of modern machine learning algorithms. In many applications, a limited amount of test data may be available during training, yet properties of min-norm in this setting are not well-understood. We establish a novel anisotropic local law to achieve these characterizations.
arXiv Detail & Related papers (2024-06-20T02:23:28Z)
How to Collaborate: Towards Maximizing the Generalization Performance in Cross-Silo Federated Learning [12.86056968708516]
Federated clustering (FL) has vivid attention as a privacy-preserving distributed learning framework. In this work, we focus on cross-silo FL, where clients become the model owners after FL data. We formulate that the performance of a client can be improved only by collaborating with other clients that have more training data.
arXiv Detail & Related papers (2024-01-24T05:41:34Z)
FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method. We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate. We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z)
FedSampling: A Better Sampling Strategy for Federated Learning [81.85411484302952]
Federated learning (FL) is an important technique for learning models from decentralized data in a privacy-preserving way. Existing FL methods usually uniformly sample clients for local model learning in each round. We propose a novel data uniform sampling strategy for federated learning (FedSampling)
arXiv Detail & Related papers (2023-06-25T13:38:51Z)
FilFL: Client Filtering for Optimized Client Participation in Federated Learning [71.46173076298957]
Federated learning enables clients to collaboratively train a model without exchanging local data. Clients participating in the training process significantly impact the convergence rate, learning efficiency, and model generalization. We propose a novel approach, client filtering, to improve model generalization and optimize client participation and training.
arXiv Detail & Related papers (2023-02-13T18:55:31Z)
When to Trust Aggregated Gradients: Addressing Negative Client Sampling in Federated Learning [41.51682329500003]
We propose a novel learning rate adaptation mechanism to adjust the server learning rate for the aggregated gradient in each round. We make theoretical deductions to find a meaningful and robust indicator that is positively related to the optimal server learning rate.
arXiv Detail & Related papers (2023-01-25T03:52:45Z)
Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks. We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z)
Straggler-Resilient Personalized Federated Learning [55.54344312542944]
Federated learning allows training models from samples distributed across a large network of clients while respecting privacy and communication restrictions. We develop a novel algorithmic procedure with theoretical speedup guarantees that simultaneously handles two of these hurdles. Our method relies on ideas from representation learning theory to find a global common representation using all clients' data and learn a user-specific set of parameters leading to a personalized solution for each client.
arXiv Detail & Related papers (2022-06-05T01:14:46Z)
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z)
Multitask Learning and Bandits via Robust Statistics [3.103098467546532]
Decision-makers often simultaneously face many related but heterogeneous learning problems. We propose a novel two-stage multitask learning estimator that exploits this structure in a sample-efficient way. Our estimator yields improved sample complexity bounds in the feature dimension $d$ relative to commonly-employed estimators.
arXiv Detail & Related papers (2021-12-28T17:37:08Z)
Generalization of Neural Combinatorial Solvers Through the Lens of Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features. We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features. Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound. Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z)
Exploiting Shared Representations for Personalized Federated Learning [54.65133770989836]
We propose a novel federated learning framework and algorithm for learning a shared data representation across clients and unique local heads for each client. Our algorithm harnesses the distributed computational power across clients to perform many local-updates with respect to the low-dimensional local parameters for every update of the representation. This result is of interest beyond federated learning to a broad class of problems in which we aim to learn a shared low-dimensional representation among data distributions.
arXiv Detail & Related papers (2021-02-14T05:36:25Z)
Straggler-Resilient Federated Learning: Leveraging the Interplay Between Statistical Accuracy and System Heterogeneity [57.275753974812666]
Federated learning involves learning from data samples distributed across a network of clients while the data remains local. In this paper, we propose a novel straggler-resilient federated learning method that incorporates statistical characteristics of the clients' data to adaptively select the clients in order to speed up the learning procedure.
arXiv Detail & Related papers (2020-12-28T19:21:14Z)
A Nonconvex Framework for Structured Dynamic Covariance Recovery [24.471814126358556]
We propose a flexible yet interpretable model for high-dimensional data with time-varying second order statistics. Motivated by the literature, we quantify factorization and smooth temporal data. We show that our approach outperforms existing baselines.
arXiv Detail & Related papers (2020-11-11T07:09:44Z)
An Investigation of Why Overparameterization Exacerbates Spurious Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior. We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z)
Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets. We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator. We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs) In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability. Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.