Data Augmentation Techniques to Reverse-Engineer Neural Network Weights from Input-Output Queries
- URL: http://arxiv.org/abs/2511.20312v1
- Date: Tue, 25 Nov 2025 13:49:48 GMT
- Title: Data Augmentation Techniques to Reverse-Engineer Neural Network Weights from Input-Output Queries
- Authors: Alexander Beiser, Flavio Martinelli, Wulfram Gerstner, Johanni Brea,
- Abstract summary: In a teacher-student setup, this translates into collecting a dataset of the teacher mapping -- querying the teacher -- and fitting a student to imitate such mapping.<n>Current methods fail when the teacher parameters are more numerous than the training data, because the student overfits to the queries instead of aligning its parameters to the teacher.<n>We explore augmentation techniques to best sample the input-output mapping of a teacher network, with the goal of eliciting a rich set of representations from the teacher hidden layers.
- Score: 47.58817034247679
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Network weights can be reverse-engineered given enough informative samples of a network's input-output function. In a teacher-student setup, this translates into collecting a dataset of the teacher mapping -- querying the teacher -- and fitting a student to imitate such mapping. A sensible choice of queries is the dataset the teacher is trained on. But current methods fail when the teacher parameters are more numerous than the training data, because the student overfits to the queries instead of aligning its parameters to the teacher. In this work, we explore augmentation techniques to best sample the input-output mapping of a teacher network, with the goal of eliciting a rich set of representations from the teacher hidden layers. We discover that standard augmentations such as rotation, flipping, and adding noise, bring little to no improvement to the identification problem. We design new data augmentation techniques tailored to better sample the representational space of the network's hidden layers. With our augmentations we extend the state-of-the-art range of recoverable network sizes. To test their scalability, we show that we can recover networks of up to 100 times more parameters than training data-points.
Related papers
- Hybrid Data-Free Knowledge Distillation [11.773963069904955]
We propose a data-free knowledge distillation method called textbfHybrtextbfid textbfData-textbfFree textbfDistillation (HiDFD)<n>Our HiDFD can achieve state-of-the-art performance using 120 times less collected data than existing methods.
arXiv Detail & Related papers (2024-12-18T05:52:16Z) - Data Augmentations in Deep Weight Spaces [89.45272760013928]
We introduce a novel augmentation scheme based on the Mixup method.
We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate.
arXiv Detail & Related papers (2023-11-15T10:43:13Z) - Expand-and-Cluster: Parameter Recovery of Neural Networks [9.497862562614666]
We show that the incoming weight vector of each neuron is identifiable up to sign or scaling, depending on the activation function.
Our novel method 'Expand-and-Cluster' can identify weights of a target network for all commonly used activation functions.
arXiv Detail & Related papers (2023-04-25T13:14:20Z) - Improved knowledge distillation by utilizing backward pass knowledge in
neural networks [17.437510399431606]
Knowledge distillation (KD) is one of the prominent techniques for model compression.
In this work, we generate new auxiliary training samples based on extracting knowledge from the backward pass of the teacher.
We show how this technique can be used successfully in applications of natural language processing (NLP) and language understanding.
arXiv Detail & Related papers (2023-01-27T22:07:38Z) - UNIKD: UNcertainty-filtered Incremental Knowledge Distillation for Neural Implicit Representation [48.49860868061573]
Recent neural implicit representations (NIRs) have achieved great success in the tasks of 3D reconstruction and novel view synthesis.
They require the images of a scene from different camera views to be available for one-time training.
This is expensive especially for scenarios with large-scale scenes and limited data storage.
We design a student-teacher framework to mitigate the catastrophic problem.
arXiv Detail & Related papers (2022-12-21T11:43:20Z) - Improving Ensemble Distillation With Weight Averaging and Diversifying
Perturbation [22.87106703794863]
It motivates distilling knowledge from the ensemble teacher into a smaller student network.
We propose a weight averaging technique where a student with multipleworks is trained to absorb the functional diversity of ensemble teachers.
We also propose a perturbation strategy that seeks inputs from which the diversities of teachers can be better transferred to the student.
arXiv Detail & Related papers (2022-06-30T06:23:03Z) - Graph Consistency based Mean-Teaching for Unsupervised Domain Adaptive
Person Re-Identification [54.58165777717885]
This paper proposes a Graph Consistency based Mean-Teaching (GCMT) method with constructing the Graph Consistency Constraint (GCC) between teacher and student networks.
Experiments on three datasets, i.e., Market-1501, DukeMTMCreID, and MSMT17, show that proposed GCMT outperforms state-of-the-art methods by clear margin.
arXiv Detail & Related papers (2021-05-11T04:09:49Z) - SLADE: A Self-Training Framework For Distance Metric Learning [75.54078592084217]
We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional unlabeled data.
We first train a teacher model on the labeled data and use it to generate pseudo labels for the unlabeled data.
We then train a student model on both labels and pseudo labels to generate final feature embeddings.
arXiv Detail & Related papers (2020-11-20T08:26:10Z) - Representation Transfer by Optimal Transport [34.77292648424614]
We use optimal transport to quantify the match between two representations.
This distance defines a regularizer promoting the similarity of the student's representation with that of the teacher.
arXiv Detail & Related papers (2020-07-13T23:42:06Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.