Related papers: Disentangled Lottery Tickets: Identifying and Assembling Core and Specialist Subnetworks

Disentangled Lottery Tickets: Identifying and Assembling Core and Specialist Subnetworks

URL: http://arxiv.org/abs/2508.16915v2
Date: Sun, 02 Nov 2025 04:49:45 GMT
Title: Disentangled Lottery Tickets: Identifying and Assembling Core and Specialist Subnetworks
Authors: Sadman Mohammad Nasif, Md Abrar Jahin, M. F. Mridha,
Abstract summary: Lottery Ticket Hypothesis suggests that within large neural networks, there exist sparse, trainable "winning tickets"<n>This paper proposes the Disentangled Lottery Ticket (DiLT) Hypothesis, which posits that the intersection mask represents a universal, task-agnostic "core" subnetwork.<n>Experiments on ImageNet and fine-grained datasets such as Stanford Cars, using ResNet and Vision Transformer architectures, show that the "core" ticket provides superior transfer learning performance, the "specialist" tickets retain domain-specific features enabling modular assembly, and the full re-assembled "union" ticket outperforms COLT.
Score: 0.2730969268472861
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Lottery Ticket Hypothesis (LTH) suggests that within large neural networks, there exist sparse, trainable "winning tickets" capable of matching the performance of the full model, but identifying them through Iterative Magnitude Pruning (IMP) is computationally expensive. Recent work introduced COLT, an accelerator that discovers a "consensus" subnetwork by intersecting masks from models trained on disjoint data partitions; however, this approach discards all non-overlapping weights, assuming they are unimportant. This paper challenges that assumption and proposes the Disentangled Lottery Ticket (DiLT) Hypothesis, which posits that the intersection mask represents a universal, task-agnostic "core" subnetwork, while the non-overlapping difference masks capture specialized, task-specific "specialist" subnetworks. A framework is developed to identify and analyze these components using the Gromov-Wasserstein (GW) distance to quantify functional similarity between layer representations and reveal modular structures through spectral clustering. Experiments on ImageNet and fine-grained datasets such as Stanford Cars, using ResNet and Vision Transformer architectures, show that the "core" ticket provides superior transfer learning performance, the "specialist" tickets retain domain-specific features enabling modular assembly, and the full re-assembled "union" ticket outperforms COLT - demonstrating that non-consensus weights play a critical functional role. This work reframes pruning as a process for discovering modular, disentangled subnetworks rather than merely compressing models.

Related papers

Routing the Lottery: Adaptive Subnetworks for Heterogeneous Data [2.5157688901171995]
Lottery Ticket Hypothesis posits that large networks contain sparseworks, or winning tickets, that can be trained in isolation to match the performance of their dense counterparts.<n>We propose the Routing Lottery (RTL), an adaptive pruning framework that discovers multiple specialized pruningworks, called adaptive tickets, each tailored to a class, cluster semantic, or environmental condition.<n>Our results recast pruning as a mechanism for aligning model structure with data heterogeneity, paving the way toward more modular and context-aware deep learning.
arXiv Detail & Related papers (2026-01-29T18:56:41Z)
White-Basilisk: A Hybrid Model for Code Vulnerability Detection [50.49233187721795]
We introduce White-Basilisk, a novel approach to vulnerability detection that demonstrates superior performance.<n>White-Basilisk achieves results in vulnerability detection tasks with a parameter count of only 200M.<n>This research establishes new benchmarks in code security and provides empirical evidence that compact, efficiently designed models can outperform larger counterparts in specialized tasks.
arXiv Detail & Related papers (2025-07-11T12:39:25Z)
Poster: Enhancing GNN Robustness for Network Intrusion Detection via Agent-based Analysis [5.881825061973424]
Graph Neural Networks (GNNs) show great promise for Network Intrusion Detection Systems (NIDS)<n>GNNs suffer performance degradation due to distribution drift and lack robustness against realistic adversarial attacks.<n>This work proposes a novel approach to enhance GNN robustness and generalization by employing Large Language Models (LLMs) in an agentic pipeline as simulated cybersecurity expert agents.
arXiv Detail & Related papers (2025-06-25T19:49:55Z)
Detecting Financial Fraud with Hybrid Deep Learning: A Mix-of-Experts Approach to Sequential and Anomalous Patterns [0.0]
This study presents a hybrid architecture for credit card fraud detection that integrates a Mixture of Experts (MoE) framework with Recurrent Neural Networks (RNNs), Transformer encoders, and Autoencoders.<n>MoE framework dynamically assigns predictive responsibility among the experts, enabling adaptive and context-sensitive decision-making.<n>The proposed hybrid system offers a scalable, modular, and regulation-aware approach to detecting increasingly sophisticated fraud patterns.
arXiv Detail & Related papers (2025-04-01T20:47:18Z)
LENS-XAI: Redefining Lightweight and Explainable Network Security through Knowledge Distillation and Variational Autoencoders for Scalable Intrusion Detection in Cybersecurity [0.0]
This study introduces the Lightweight Explainable Network Security framework (LENS-XAI)<n>LENS-XAI combines robust intrusion detection with enhanced interpretability and scalability.<n>This research contributes significantly to advancing IDS by addressing computational efficiency, feature interpretability, and real-world applicability.
arXiv Detail & Related papers (2025-01-01T10:00:49Z)
Scalable and Effective Negative Sample Generation for Hyperedge Prediction [55.9298019975967]
Hyperedge prediction is crucial for understanding complex multi-entity interactions in web-based applications. Traditional methods often face difficulties in generating high-quality negative samples due to imbalance between positive and negative instances. We present the scalable and effective negative sample generation for Hyperedge Prediction (SEHP) framework, which utilizes diffusion models to tackle these challenges.
arXiv Detail & Related papers (2024-11-19T09:16:25Z)
Advanced Financial Fraud Detection Using GNN-CL Model [13.5240775562349]
The innovative GNN-CL model proposed in this paper marks a breakthrough in the field of financial fraud detection. It combines the advantages of graph neural networks (gnn), convolutional neural networks (cnn) and long short-term memory (LSTM) networks. A key novelty of this paper is the use of multilayer perceptrons (MLPS) to estimate node similarity.
arXiv Detail & Related papers (2024-07-09T03:59:06Z)
Masked Completion via Structured Diffusion with White-Box Transformers [23.07048591213815]
We provide the first instantiation of the white-box design paradigm that can be applied to large-scale unsupervised representation learning. We do this by exploiting a fundamental connection between diffusion, compression, and (masked) completion, deriving a deep transformer-like masked autoencoder architecture. CRATE-MAE demonstrates highly promising performance on large-scale imagery datasets.
arXiv Detail & Related papers (2024-04-03T04:23:01Z)
COLT: Cyclic Overlapping Lottery Tickets for Faster Pruning of Convolutional Neural Networks [6.883139128255468]
This research aims to generate winning lottery tickets from a set of lottery tickets that can achieve similar accuracy to the original unpruned network.<n>We introduce a novel winning ticket called Cyclic Overlapping Lottery Ticket (COLT) by data splitting and cyclic retraining of the pruned network from scratch.
arXiv Detail & Related papers (2022-12-24T16:38:59Z)
Semantic-aware Modular Capsule Routing for Visual Question Answering [55.03883681191765]
We propose a Semantic-aware modUlar caPsulE framework, termed as SUPER, to better capture the instance-specific vision-semantic characteristics. We comparatively justify the effectiveness and generalization ability of our proposed SUPER scheme over five benchmark datasets.
arXiv Detail & Related papers (2022-07-21T10:48:37Z)
SeqTR: A Simple yet Universal Network for Visual Grounding [88.03253818868204]
We propose a simple yet universal network termed SeqTR for visual grounding tasks. We cast visual grounding as a point prediction problem conditioned on image and text inputs. Under this paradigm, visual grounding tasks are unified in our SeqTR network without task-specific branches or heads.
arXiv Detail & Related papers (2022-03-30T12:52:46Z)
Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity. In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark. We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z)
Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets [127.56361320894861]
Lottery ticket hypothesis (LTH) has shown that dense models contain highly sparseworks (i.e., winning tickets) that can be trained in isolation to match full accuracy. In this paper, we demonstrate the first positive result that a structurally sparse winning ticket can be effectively found in general. Specifically, we first "re-fill" pruned elements back in some channels deemed to be important, and then "re-group" non-zero elements to create flexible group-wise structural patterns.
arXiv Detail & Related papers (2022-02-09T21:33:51Z)
Robustness Certificates for Implicit Neural Networks: A Mixed Monotone Contractive Approach [60.67748036747221]
Implicit neural networks offer competitive performance and reduced memory consumption. They can remain brittle with respect to input adversarial perturbations. This paper proposes a theoretical and computational framework for robustness verification of implicit neural networks.
arXiv Detail & Related papers (2021-12-10T03:08:55Z)
Federated Learning with Unreliable Clients: Performance Analysis and Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients. However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training. We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z)
The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets. The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning. We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z)
Bespoke vs. Pr\^et-\`a-Porter Lottery Tickets: Exploiting Mask Similarity for Trainable Sub-Network Finding [0.913755431537592]
Lottery Tickets are sparse sub-networks within over-parametrized networks. We propose a consensus-based method for generating refined lottery tickets. We successfully train these sub-networks to performance comparable to that of ordinary lottery tickets.
arXiv Detail & Related papers (2020-07-06T22:48:35Z)
ResNeSt: Split-Attention Networks [86.25490825631763]
We present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations. Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification.
arXiv Detail & Related papers (2020-04-19T20:40:31Z)
Learn2Perturb: an End-to-end Feature Perturbation Learning to Improve Adversarial Robustness [79.47619798416194]
Learn2Perturb is an end-to-end feature perturbation learning approach for improving the adversarial robustness of deep neural networks. Inspired by the Expectation-Maximization, an alternating back-propagation training algorithm is introduced to train the network and noise parameters consecutively.
arXiv Detail & Related papers (2020-03-02T18:27:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.