Related papers: Fed-urlBERT: Client-side Lightweight Federated Transformers for URL Threat Analysis

Fed-urlBERT: Client-side Lightweight Federated Transformers for URL Threat Analysis

URL: http://arxiv.org/abs/2312.03636v1
Date: Wed, 6 Dec 2023 17:31:16 GMT
Title: Fed-urlBERT: Client-side Lightweight Federated Transformers for URL Threat Analysis
Authors: Yujie Li, Yanbin Wang, Haitao Xu, Zhenhao Guo, Fan Zhang, Ruitong Liu, Wenrui Ma,
Abstract summary: Federated URL pre-trained model designed to address both privacy concerns and the need for cross-domain collaboration in cybersecurity. Our appraoch achieves performance comparable to centralized model under both independently and identically distributed (IID) and two non-IID data scenarios.
Score: 6.552094912099549
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In evolving cyber landscapes, the detection of malicious URLs calls for cooperation and knowledge sharing across domains. However, collaboration is often hindered by concerns over privacy and business sensitivities. Federated learning addresses these issues by enabling multi-clients collaboration without direct data exchange. Unfortunately, if highly expressive Transformer models are used, clients may face intolerable computational burdens, and the exchange of weights could quickly deplete network bandwidth. In this paper, we propose Fed-urlBERT, a federated URL pre-trained model designed to address both privacy concerns and the need for cross-domain collaboration in cybersecurity. Fed-urlBERT leverages split learning to divide the pre-training model into client and server part, so that the client part takes up less extensive computation resources and bandwidth. Our appraoch achieves performance comparable to centralized model under both independently and identically distributed (IID) and two non-IID data scenarios. Significantly, our federated model shows about an 7% decrease in the FPR compared to the centralized model. Additionally, we implement an adaptive local aggregation strategy that mitigates heterogeneity among clients, demonstrating promising performance improvements. Overall, our study validates the applicability of the proposed Transformer federated learning for URL threat analysis, establishing a foundation for real-world collaborative cybersecurity efforts. The source code is accessible at https://github.com/Davidup1/FedURLBERT.

Related papers

Robust Federated Learning in the Face of Covariate Shift: A Magnitude Pruning with Hybrid Regularization Framework for Enhanced Model Aggregation [1.519321208145928]
Federated Learning (FL) offers a promising framework for individuals aiming to collaboratively develop a shared model. variations in data distribution among clients can profoundly affect FL methodologies, primarily due to instabilities in the aggregation process. We propose a novel FL framework, combining individual parameter pruning and regularization techniques to improve the robustness of individual clients' models to aggregate.
arXiv Detail & Related papers (2024-12-19T16:22:37Z)
Protection against Source Inference Attacks in Federated Learning using Unary Encoding and Shuffling [6.260747047974035]
Federated Learning (FL) enables clients to train a joint model without disclosing their local data. Recently, the source inference attack (SIA) has been proposed where an honest-but-curious central server tries to identify exactly which client owns a specific data record. We propose a defense against SIAs by using a trusted shuffler, without compromising the accuracy of the joint model.
arXiv Detail & Related papers (2024-11-10T13:17:11Z)
FedCAP: Robust Federated Learning via Customized Aggregation and Personalization [13.17735010891312]
Federated learning (FL) has been applied to various privacy-preserving scenarios. We propose FedCAP, a robust FL framework against both data heterogeneity and Byzantine attacks. We show that FedCAP performs well in several non-IID settings and shows strong robustness under a series of poisoning attacks.
arXiv Detail & Related papers (2024-10-16T23:01:22Z)
Federated Instruction Tuning of LLMs with Domain Coverage Augmentation [87.49293964617128]
Federated Domain-specific Instruction Tuning (FedDIT) utilizes limited cross-client private data together with various strategies of instruction augmentation. We propose FedDCA, which optimize domain coverage through greedy client center selection and retrieval-based augmentation. For client-side computational efficiency and system scalability, FedDCA$*$, the variant of FedDCA, utilizes heterogeneous encoders with server-side feature alignment.
arXiv Detail & Related papers (2024-09-30T09:34:31Z)
ACCESS-FL: Agile Communication and Computation for Efficient Secure Aggregation in Stable Federated Learning Networks [26.002975401820887]
Federated Learning (FL) is a distributed learning framework designed for privacy-aware applications. Traditional FL approaches risk exposing sensitive client data when plain model updates are transmitted to the server. Google's Secure Aggregation (SecAgg) protocol addresses this threat by employing a double-masking technique. We propose ACCESS-FL, a communication-and-computation-efficient secure aggregation method.
arXiv Detail & Related papers (2024-09-03T09:03:38Z)
Federated Face Forgery Detection Learning with Personalized Representation [63.90408023506508]
Deep generator technology can produce high-quality fake videos that are indistinguishable, posing a serious social threat. Traditional forgery detection methods directly centralized training on data. The paper proposes a novel federated face forgery detection learning with personalized representation.
arXiv Detail & Related papers (2024-06-17T02:20:30Z)
Boosting Communication Efficiency of Federated Learning's Secure Aggregation [22.943966056320424]
Federated Learning (FL) is a decentralized machine learning approach where client devices train models locally and send them to a server. FL is vulnerable to model inversion attacks, where the server can infer sensitive client data from trained models. Google's Secure Aggregation (SecAgg) protocol addresses this data privacy issue by masking each client's trained model. This poster introduces a Communication-Efficient Secure Aggregation (CESA) protocol that substantially reduces this overhead.
arXiv Detail & Related papers (2024-05-02T10:00:16Z)
Client-side Gradient Inversion Against Federated Learning from Poisoning [59.74484221875662]
Federated Learning (FL) enables distributed participants to train a global model without sharing data directly to a central server. Recent studies have revealed that FL is vulnerable to gradient inversion attack (GIA), which aims to reconstruct the original training samples. We propose Client-side poisoning Gradient Inversion (CGI), which is a novel attack method that can be launched from clients.
arXiv Detail & Related papers (2023-09-14T03:48:27Z)
Towards Instance-adaptive Inference for Federated Learning [80.38701896056828]
Federated learning (FL) is a distributed learning paradigm that enables multiple clients to learn a powerful global model by aggregating local training. In this paper, we present a novel FL algorithm, i.e., FedIns, to handle intra-client data heterogeneity by enabling instance-adaptive inference in the FL framework. Our experiments show that our FedIns outperforms state-of-the-art FL algorithms, e.g., a 6.64% improvement against the top-performing method with less than 15% communication cost on Tiny-ImageNet.
arXiv Detail & Related papers (2023-08-11T09:58:47Z)
Client-specific Property Inference against Secure Aggregation in Federated Learning [52.8564467292226]
Federated learning has become a widely used paradigm for collaboratively training a common model among different participants. Many attacks have shown that it is still possible to infer sensitive information such as membership, property, or outright reconstruction of participant data. We show that simple linear models can effectively capture client-specific properties only from the aggregated model updates.
arXiv Detail & Related papers (2023-03-07T14:11:01Z)
Comfetch: Federated Learning of Large Networks on Constrained Clients via Sketching [28.990067638230254]
Federated learning (FL) is a popular paradigm for private and collaborative model training on the edge. We propose a novel algorithm, Comdirectional, which allows clients to train large networks using representations of the global neural network.
arXiv Detail & Related papers (2021-09-17T04:48:42Z)
Federated Multi-Target Domain Adaptation [99.93375364579484]
Federated learning methods enable us to train machine learning models on distributed user data while preserving its privacy. We consider a more practical scenario where the distributed client data is unlabeled, and a centralized labeled dataset is available on the server. We propose an effective DualAdapt method to address the new challenges.
arXiv Detail & Related papers (2021-08-17T17:53:05Z)
Federated Learning with Unreliable Clients: Performance Analysis and Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients. However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training. We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.