Osmotic Learning: A Self-Supervised Paradigm for Decentralized Contextual Data Representation
- URL: http://arxiv.org/abs/2512.23096v1
- Date: Sun, 28 Dec 2025 22:25:16 GMT
- Title: Osmotic Learning: A Self-Supervised Paradigm for Decentralized Contextual Data Representation
- Authors: Mario Colosi, Reza Farahani, Maria Fazio, Radu Prodan, Massimo Villari,
- Abstract summary: This paper introduces OSM-L, a self-supervised distributed learning paradigm designed to uncover higher-level latent knowledge from distributed data.<n>OSM-L iteratively aligns local data representations, enabling information diffusion and convergence.<n> Experimental results confirm OSM-L's convergence and representation capabilities on structured datasets.
- Score: 1.5329374712396715
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data within a specific context gains deeper significance beyond its isolated interpretation. In distributed systems, interdependent data sources reveal hidden relationships and latent structures, representing valuable information for many applications. This paper introduces Osmotic Learning (OSM-L), a self-supervised distributed learning paradigm designed to uncover higher-level latent knowledge from distributed data. The core of OSM-L is osmosis, a process that synthesizes dense and compact representation by extracting contextual information, eliminating the need for raw data exchange between distributed entities. OSM-L iteratively aligns local data representations, enabling information diffusion and convergence into a dynamic equilibrium that captures contextual patterns. During training, it also identifies correlated data groups, functioning as a decentralized clustering mechanism. Experimental results confirm OSM-L's convergence and representation capabilities on structured datasets, achieving over 0.99 accuracy in local information alignment while preserving contextual integrity.
Related papers
- Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks [63.541114376141735]
Large language models (LLMs) are proliferating rapidly at the edge, delivering intelligent capabilities across diverse application scenarios.<n>However, their practical deployment in collaborative scenarios confronts fundamental challenges: privacy vulnerabilities, communication overhead, and computational bottlenecks.<n>We propose Federated Attention (FedAttn), which integrates the federated paradigm into the self-attention mechanism.
arXiv Detail & Related papers (2025-11-04T15:14:58Z) - Harmonizing Generalization and Personalization in Ring-topology Decentralized Federated Learning [41.4210010333948]
We introduce Ring-topology Decentralized Federated Learning (RDFL) for distributed model training, aiming to avoid the inherent risks of centralized failure in server-based FL.<n>RDFL faces the challenge of low information-sharing efficiency due to the point-to-point communication manner when handling inherent data heterogeneity.<n>We propose a Divide-and-conquer RDFL framework (DRDFL) that uses a feature generation model to extract personalized information and invariant shared knowledge from the underlying data distribution.
arXiv Detail & Related papers (2025-04-27T04:38:49Z) - Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax [73.03684002513218]
We enhance Deep InfoMax (DIM) to enable automatic matching of learned representations to a selected prior distribution.<n>We show that such modification allows for learning uniformly and normally distributed representations.<n>The results indicate a moderate trade-off between the performance on the downstream tasks and quality of DM.
arXiv Detail & Related papers (2024-10-09T15:40:04Z) - Federated Clustering: An Unsupervised Cluster-Wise Training for Decentralized Data Distributions [1.6385815610837167]
Federated Cluster-Wise Refinement (FedCRef) involves clients that collaboratively train models on clusters with similar data distributions.
In these groups, clients collaboratively train a shared model representing each data distribution, while continuously refining their local clusters to enhance data association accuracy.
This iterative process allows our system to identify all potential data distributions across the network and develop robust representation models for each.
arXiv Detail & Related papers (2024-08-20T09:05:44Z) - Comparing the information content of probabilistic representation spaces [3.7277730514654555]
Probabilistic representation spaces convey information about a dataset and are shaped by factors such as the training data, network architecture, and loss function.<n>We propose two information-theoretic measures to compare general probabilistic representation spaces.<n>We demonstrate the utility of these measures in three case studies.
arXiv Detail & Related papers (2024-05-31T17:33:07Z) - FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms
for Federated Learning [1.4656078321003647]
Federated learning (FL) is a decentralized machine learning approach where independent learners process data privately.
We study the currently popular data partitioning techniques and visualize their main disadvantages.
We propose a method that leverages entropy and symmetry to construct 'the most challenging' and controllable data distributions.
arXiv Detail & Related papers (2023-10-11T18:39:08Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - RelaySum for Decentralized Deep Learning on Heterogeneous Data [71.36228931225362]
In decentralized machine learning, workers compute model updates on their local data.
Because the workers only communicate with few neighbors without central coordination, these updates propagate progressively over the network.
This paradigm enables distributed training on networks without all-to-all connectivity, helping to protect data privacy as well as to reduce the communication cost of distributed training in data centers.
arXiv Detail & Related papers (2021-10-08T14:55:32Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z) - Input-Output Balanced Framework for Long-tailed LiDAR Semantic
Segmentation [12.639524717464509]
We propose an input-output balanced framework to handle the issue of long-tailed distribution.
For the input space, we synthesize these tailed instances from mesh models and well simulate the position and density distribution of LiDAR scan.
For the output space, a multi-head block is proposed to group different categories based on their shapes and instance amounts.
arXiv Detail & Related papers (2021-03-26T05:42:11Z) - Deep Fusion Clustering Network [38.540761683389135]
We propose a Deep Fusion Clustering Network (DFCN) for deep clustering.
In our network, an interdependency learning-based Structure and Attribute Information Fusion (SAIF) module is proposed to explicitly merge the representations learned by an autoencoder and a graph autoencoder.
Experiments on six benchmark datasets have demonstrated that the proposed DFCN consistently outperforms the state-of-the-art deep clustering methods.
arXiv Detail & Related papers (2020-12-15T09:37:59Z) - Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model.
The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.