Data Collaboration Analysis with Orthonormal Basis Selection and Alignment
- URL: http://arxiv.org/abs/2403.02780v5
- Date: Fri, 08 Aug 2025 10:55:06 GMT
- Title: Data Collaboration Analysis with Orthonormal Basis Selection and Alignment
- Authors: Keiyu Nosaka, Yuichi Takano, Akiko Yoshise,
- Abstract summary: Data Collaboration (DC) enables multiple parties to jointly train a model without exposing their private datasets.<n>Existing theory asserts that any target basis spanning the same subspace as the secret bases should suffice.<n>We introduce Orthonormal Data Collaboration (ODC), a novel DC framework that explicitly enforces orthonormality constraints on both the secret and target bases.
- Score: 2.928964540437144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data Collaboration (DC) enables multiple parties to jointly train a model without exposing their private datasets. Each party privately transforms its data using a secret linear basis and shares only the resulting intermediate representations. Existing theory asserts that any target basis spanning the same subspace as the secret bases should suffice; however, empirical evidence reveals that the particular choice of target basis significantly influences model accuracy and stability. In this paper, we introduce Orthonormal Data Collaboration (ODC), a novel DC framework that explicitly enforces orthonormality constraints on both the secret and target bases. Under these constraints, the basis alignment step reduces precisely to the classical Orthogonal Procrustes Problem, admitting a closed-form solution. We rigorously establish that the resulting orthonormal change-of-basis matrices achieve orthogonal concordance, aligning all parties' intermediate representations up to a common orthogonal transformation. Consequently, downstream model performance becomes invariant to the specific choice of orthonormal target basis. Computationally, ODC substantially reduces alignment complexity from O(\min\{a,(cl)^2,a^2cl) to O(acl^2) where a denotes anchor data size, l the latent dimension, and c the number of collaborating parties. Extensive empirical evaluations confirm the theoretical advantages of ODC, demonstrating alignment speed-ups of up to two orders of magnitude compared to state-of-the-art DC methods, alongside comparable or superior accuracy across multiple benchmark datasets. ODC maintains robust privacy under the semi-honest threat model and requires only a single round of communication. These results establish ODC as a practically advantageous and computationally efficient enhancement to existing DC pipelines, particularly when orthonormal secret bases are naturally feasible.
Related papers
- Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z) - The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss [53.542743390809356]
This paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB)<n>Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function.<n>We present a concrete solution that simultaneously achieves both principles via DFT or DWT.
arXiv Detail & Related papers (2025-12-21T06:08:22Z) - WUSH: Near-Optimal Adaptive Transforms for LLM Quantization [52.77441224845925]
Quantization to low bitwidth is a standard approach for deploying large language models.<n>A few extreme weights and activations stretch the dynamic range and reduce the effective resolution of the quantizer.<n>We derive, for the first time, closed-form optimal linear blockwise transforms for joint weight-activation quantization.
arXiv Detail & Related papers (2025-11-30T16:17:34Z) - A decoupled alignment kernel for peptide membrane permeability predictions [35.849562641740754]
We propose a monomer-aware decoupled global alignment kernel (MD-GAK), which couples chemically meaningful residue-residue similarity with sequence alignment.<n>We also introduce a variant, PMD-GAK, which incorporates a triangular positional prior.<n>Since our focus is on uncertainty estimation, we use Gaussian Processes as the predictive model, as both MD-GAK and PMD-GAK can be directly applied within this framework.
arXiv Detail & Related papers (2025-11-26T16:40:41Z) - Coresets from Trajectories: Selecting Data via Correlation of Loss Differences [14.31847187460321]
Correlation of Loss Differences (CLD) is a scalable metric for coreset selection.<n>On CIFAR-100 and ImageNet-1k, CLD-based coresets typically outperform or closely match state-of-the-art methods.
arXiv Detail & Related papers (2025-08-27T19:18:39Z) - When few labeled target data suffice: a theory of semi-supervised domain adaptation via fine-tuning from multiple adaptive starts [5.839411310096219]
Semi-supervised domain adaptation (SSDA) aims to achieve high predictive performance in the target domain with limited labeled target data.<n>We develop a theoretical framework based on structural causal models (SCMs) which allows us to analyze and quantify the performance of SSDA methods when labeled target data is limited.<n>We propose the Multi Adaptive-Start Fine-Tuning (MASFT) algorithm, which fine-tunes UDA models from multiple starting points and selects the best-performing one.
arXiv Detail & Related papers (2025-07-19T15:18:28Z) - PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation [70.98107766265636]
This paper takes the geometric attributes of pre-trained weights as a starting point, systematically analyzing three key components: magnitude, absolute angle, and pairwise angular structure.<n>We find that the pairwise angular structure remains stable across diverse corrupted domains and encodes domain-invariant semantic information, suggesting it should be preserved during adaptation.
arXiv Detail & Related papers (2025-06-03T05:18:15Z) - On the Interconnections of Calibration, Quantification, and Classifier Accuracy Prediction under Dataset Shift [58.91436551466064]
This paper investigates the interconnections among three fundamental problems, calibration, and quantification, under dataset shift conditions.<n>We show that access to an oracle for any one of these tasks enables the resolution of the other two.<n>We propose new methods for each problem based on direct adaptations of well-established methods borrowed from the other disciplines.
arXiv Detail & Related papers (2025-05-16T15:42:55Z) - Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning [8.324857108715007]
This paper introduces novel Bellman mappings (B-Maps) for value iteration (VI) in distributed reinforcement learning (DRL)<n>Each agent constructs a nonparametric B-Map from its private data, operating on Q-functions represented in a reproducing kernel Hilbert space.<n>A detailed performance analysis shows that the proposed DRL framework effectively approximates the performance of a centralized node.
arXiv Detail & Related papers (2025-03-20T14:39:21Z) - The Data Minimization Principle in Machine Learning [61.17813282782266]
Data minimization aims to reduce the amount of data collected, processed or retained.
It has been endorsed by various global data protection regulations.
However, its practical implementation remains a challenge due to the lack of a rigorous formulation.
arXiv Detail & Related papers (2024-05-29T19:40:27Z) - Synergizing Privacy and Utility in Data Analytics Through Advanced Information Theorization [2.28438857884398]
We introduce three sophisticated algorithms: a Noise-Infusion Technique tailored for high-dimensional image data, a Variational Autoencoder (VAE) for robust feature extraction and an Expectation Maximization (EM) approach optimized for structured data privacy.
Our methods significantly reduce mutual information between sensitive attributes and transformed data, thereby enhancing privacy.
The research contributes to the field by providing a flexible and effective strategy for deploying privacy-preserving algorithms across various data types.
arXiv Detail & Related papers (2024-04-24T22:58:42Z) - New Solutions Based on the Generalized Eigenvalue Problem for the Data Collaboration Analysis [10.58933333850352]
Data Collaboration Analysis (DCA) is noted for its efficiency in terms of computational cost and communication load.
Existing optimization problems for determining the necessary collaborative functions have faced challenges.
This research addresses these issues by formulating the optimization problem through the segmentation of matrices into column vectors.
arXiv Detail & Related papers (2024-04-22T13:26:42Z) - State-of-the-Art Approaches to Enhancing Privacy Preservation of Machine Learning Datasets: A Survey [0.0]
This paper examines the evolving landscape of machine learning (ML) and its profound impact across various sectors.
It focuses on the emerging field of Privacy-preserving Machine Learning (PPML)
As ML applications become increasingly integral to industries like telecommunications, financial technology, and surveillance, they raise significant privacy concerns.
arXiv Detail & Related papers (2024-02-25T17:31:06Z) - Privacy-preserving Federated Primal-dual Learning for Non-convex and Non-smooth Problems with Model Sparsification [51.04894019092156]
Federated learning (FL) has been recognized as a rapidly growing area, where the model is trained over clients under the FL orchestration (PS)
In this paper, we propose a novel primal sparification algorithm for and guarantee non-smooth FL problems.
Its unique insightful properties and its analyses are also presented.
arXiv Detail & Related papers (2023-10-30T14:15:47Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - A Robust Negative Learning Approach to Partial Domain Adaptation Using
Source Prototypes [0.8895157045883034]
This work proposes a robust Partial Domain Adaptation (PDA) framework that mitigates the negative transfer problem.
It includes diverse, complementary label feedback, alleviating the effect of incorrect feedback and promoting pseudo-label refinement.
We conducted a series of comprehensive experiments, including an ablation analysis, covering a range of partial domain adaptation tasks.
arXiv Detail & Related papers (2023-09-07T07:26:27Z) - Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models.
It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation.
We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - Memory Consistent Unsupervised Off-the-Shelf Model Adaptation for
Source-Relaxed Medical Image Segmentation [13.260109561599904]
Unsupervised domain adaptation (UDA) has been a vital protocol for migrating information learned from a labeled source domain to an unlabeled heterogeneous target domain.
We propose "off-the-shelf (OS)" UDA (OSUDA), aimed at image segmentation, by adapting an OS segmentor trained in a source domain to a target domain, in the absence of source domain data in adaptation.
arXiv Detail & Related papers (2022-09-16T13:13:50Z) - On Certifying and Improving Generalization to Unseen Domains [87.00662852876177]
Domain Generalization aims to learn models whose performance remains high on unseen domains encountered at test-time.
It is challenging to evaluate DG algorithms comprehensively using a few benchmark datasets.
We propose a universal certification framework that can efficiently certify the worst-case performance of any DG method.
arXiv Detail & Related papers (2022-06-24T16:29:43Z) - Balancing Discriminability and Transferability for Source-Free Domain
Adaptation [55.143687986324935]
Conventional domain adaptation (DA) techniques aim to improve domain transferability by learning domain-invariant representations.
The requirement of simultaneous access to labeled source and unlabeled target renders them unsuitable for the challenging source-free DA setting.
We derive novel insights to show that a mixup between original and corresponding translated generic samples enhances the discriminability-transferability trade-off.
arXiv Detail & Related papers (2022-06-16T09:06:22Z) - Decentralized Stochastic Optimization with Inherent Privacy Protection [103.62463469366557]
Decentralized optimization is the basic building block of modern collaborative machine learning, distributed estimation and control, and large-scale sensing.
Since involved data, privacy protection has become an increasingly pressing need in the implementation of decentralized optimization algorithms.
arXiv Detail & Related papers (2022-05-08T14:38:23Z) - Source-Free Domain Adaptation via Distribution Estimation [106.48277721860036]
Domain Adaptation aims to transfer the knowledge learned from a labeled source domain to an unlabeled target domain whose data distributions are different.
Recently, Source-Free Domain Adaptation (SFDA) has drawn much attention, which tries to tackle domain adaptation problem without using source data.
In this work, we propose a novel framework called SFDA-DE to address SFDA task via source Distribution Estimation.
arXiv Detail & Related papers (2022-04-24T12:22:19Z) - Efficient Logistic Regression with Local Differential Privacy [0.0]
Internet of Things devices are expanding rapidly and generating huge amount of data.
There is an increasing need to explore data collected from these devices.
Collaborative learning provides a strategic solution for the Internet of Things settings but also raises public concern over data privacy.
arXiv Detail & Related papers (2022-02-05T22:44:03Z) - Linear Model with Local Differential Privacy [0.225596179391365]
Privacy preserving techniques have been widely studied to analyze distributed data across different agencies.
Secure multiparty computation has been widely studied for privacy protection with high privacy level but intense cost.
matrix masking technique is applied to encrypt data such that the secure schemes are against malicious adversaries.
arXiv Detail & Related papers (2022-02-05T01:18:00Z) - Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data.
FL and related techniques are often described as privacy-preserving.
We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Deformation Robust Roto-Scale-Translation Equivariant CNNs [10.44236628142169]
Group-equivariant convolutional neural networks (G-CNNs) achieve significantly improved generalization performance with intrinsic symmetry.
General theory and practical implementation of G-CNNs have been studied for planar images under either rotation or scaling transformation.
arXiv Detail & Related papers (2021-11-22T03:58:24Z) - Adapting Off-the-Shelf Source Segmenter for Target Medical Image
Segmentation [12.703234995718372]
Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a labeled source domain to an unlabeled and unseen target domain.
Access to the source domain data at the adaptation stage is often limited, due to data storage or privacy issues.
We propose to adapt an off-the-shelf" segmentation model pre-trained in the source domain to the target domain.
arXiv Detail & Related papers (2021-06-23T16:16:55Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain
Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain.
We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one.
Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z) - Unsupervised Domain Adaptation via Structurally Regularized Deep
Clustering [35.008158504090176]
Unsupervised domain adaptation (UDA) is to make predictions for unlabeled data on a target domain, given labeled data on a source domain whose distribution shifts from the target one.
We propose to directly uncover the intrinsic target discrimination via discriminative clustering of target data.
We term our proposed method as Structurally Regularized Deep Clustering (SRDC), where we also enhance target discrimination with clustering of intermediate network features.
arXiv Detail & Related papers (2020-03-19T07:26:41Z) - D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis
for Multi-view High-dimensional Data [11.184915338554422]
A popular model in high-dimensional multi-view data analysis decomposes each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views.
We propose a novel decomposition method for this model, called decomposition-based generalized canonical correlation analysis (D-GCCA)
Our D-GCCA takes one step further than generalized canonical correlation analysis by separating common and distinctive components among canonical variables.
arXiv Detail & Related papers (2020-01-09T06:35:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.