Distributed Sparse Feature Selection in Communication-Restricted
Networks
- URL: http://arxiv.org/abs/2111.02802v1
- Date: Tue, 2 Nov 2021 05:02:24 GMT
- Title: Distributed Sparse Feature Selection in Communication-Restricted
Networks
- Authors: Hanie Barghi, Amir Najafi, and Seyed Abolfazl Motahari
- Abstract summary: We propose and theoretically analyze a new distributed scheme for sparse linear regression and feature selection.
In order to infer the causal dimensions from the whole dataset, we propose a simple, yet effective method for information sharing in the network.
- Score: 6.9257380648471765
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper aims to propose and theoretically analyze a new distributed scheme
for sparse linear regression and feature selection. The primary goal is to
learn the few causal features of a high-dimensional dataset based on noisy
observations from an unknown sparse linear model. However, the presumed
training set which includes $n$ data samples in $\mathbb{R}^p$ is already
distributed over a large network with $N$ clients connected through extremely
low-bandwidth links. Also, we consider the asymptotic configuration of $1\ll
N\ll n\ll p$. In order to infer the causal dimensions from the whole dataset,
we propose a simple, yet effective method for information sharing in the
network. In this regard, we theoretically show that the true causal features
can be reliably recovered with negligible bandwidth usage of $O\left(N\log
p\right)$ across the network. This yields a significantly lower communication
cost in comparison with the trivial case of transmitting all the samples to a
single node (centralized scenario), which requires $O\left(np\right)$
transmissions. Even more sophisticated schemes such as ADMM still have a
communication complexity of $O\left(Np\right)$. Surprisingly, our sample
complexity bound is proved to be the same (up to a constant factor) as the
optimal centralized approach for a fixed performance measure in each node,
while that of a na\"{i}ve decentralized technique grows linearly with $N$.
Theoretical guarantees in this paper are based on the recent analytic framework
of debiased LASSO in Javanmard et al. (2019), and are supported by several
computer experiments performed on both synthetic and real-world datasets.
Related papers
- Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods [43.32546195968771]
We study the data-dependent convergence and generalization behavior of gradient methods for neural networks with smooth activation.
Our results improve upon the shortcomings of the well-established Rademacher complexity-based bounds.
We show that a large step-size significantly improves upon the NTK regime's results in classifying the XOR distribution.
arXiv Detail & Related papers (2024-10-13T21:49:29Z) - Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent [83.85536329832722]
We show that gradient descent (SGD) can efficiently solve the $k$-parity problem on a $d$dimensional hypercube.
We then demonstrate how a trained neural network with SGD, solving the $k$-parity problem with small statistical errors.
arXiv Detail & Related papers (2024-04-18T17:57:53Z) - Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs [42.551773746803946]
Vision tasks are characterized by the properties of locality and translation invariance.
The superior performance of convolutional neural networks (CNNs) on these tasks is widely attributed to the inductive bias of locality and weight sharing baked into their architecture.
Existing attempts to quantify the statistical benefits of these biases in CNNs over locally connected neural networks (LCNs) and fully connected neural networks (FCNs) fall into one of the following categories.
arXiv Detail & Related papers (2024-03-23T03:57:28Z) - Collaborative non-parametric two-sample testing [55.98760097296213]
The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected.
We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure.
Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
arXiv Detail & Related papers (2024-02-08T14:43:56Z) - On the Convergence of Federated Averaging under Partial Participation for Over-parameterized Neural Networks [13.2844023993979]
Federated learning (FL) is a widely employed distributed paradigm for collaboratively machine learning models from multiple clients without sharing local data.
In this paper, we show that FedAvg converges to a global minimum at a global rate at a global focus.
arXiv Detail & Related papers (2023-10-09T07:56:56Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Settling the Sample Complexity of Model-Based Offline Reinforcement
Learning [50.5790774201146]
offline reinforcement learning (RL) learns using pre-collected data without further exploration.
Prior algorithms or analyses either suffer from suboptimal sample complexities or incur high burn-in cost to reach sample optimality.
We demonstrate that the model-based (or "plug-in") approach achieves minimax-optimal sample complexity without burn-in cost.
arXiv Detail & Related papers (2022-04-11T17:26:19Z) - High-Dimensional Inference over Networks: Linear Convergence and
Statistical Guarantees [20.701475313495884]
We study a sparse linear regression over a network of agents, modeled as an undirected graph and no server node.
We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm.
arXiv Detail & Related papers (2022-01-21T01:26:08Z) - Besov Function Approximation and Binary Classification on
Low-Dimensional Manifolds Using Convolutional Residual Networks [42.43493635899849]
We establish theoretical guarantees of convolutional residual networks (ConvResNet) in terms of function approximation and statistical estimation for binary classification.
Our results demonstrate that ConvResNets are adaptive to low-dimensional structures of data sets.
arXiv Detail & Related papers (2021-09-07T02:58:11Z) - Distributed Saddle-Point Problems Under Similarity [173.19083235638104]
We show that a given suboptimality $epsilon0$ is achieved master/workers networks in $Omegabig.
We then propose algorithms matching the lower bounds either types of networks (up to log-overs)
We assess effectiveness of the proposed algorithms on a robust logistic regression problem.
arXiv Detail & Related papers (2021-07-22T14:25:16Z) - OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax
Layer [77.90012156266324]
This paper aims to find a subspace of neural networks that can facilitate a large decision margin.
We propose the Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain during both the training and test processes.
Experimental results demonstrate that the proposed OSL has better performance than the methods used for comparison on four small-sample benchmark datasets.
arXiv Detail & Related papers (2020-04-20T02:41:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.