Related papers: Semi-Supervised Online Learning on the Edge by Transforming Knowledge from Teacher Models

Semi-Supervised Online Learning on the Edge by Transforming Knowledge from Teacher Models

URL: http://arxiv.org/abs/2512.16866v1
Date: Thu, 18 Dec 2025 18:37:28 GMT
Title: Semi-Supervised Online Learning on the Edge by Transforming Knowledge from Teacher Models
Authors: Jiabin Xue,
Abstract summary: Edge machine learning (Edge ML) enables training ML models using the vast data distributed across network edges.<n>Online Edge ML allows models to be trained directly on edge devices and updated continuously with new data.<n>We propose Knowledge Transformation (KT), a hybrid method combining Knowledge Distillation, Active Learning, and causal reasoning.
Score: 1.6490670414281121
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Edge machine learning (Edge ML) enables training ML models using the vast data distributed across network edges. However, many existing approaches assume static models trained centrally and then deployed, making them ineffective against unseen data. To address this, Online Edge ML allows models to be trained directly on edge devices and updated continuously with new data. This paper explores a key challenge of Online Edge ML: "How to determine labels for truly future, unseen data points". We propose Knowledge Transformation (KT), a hybrid method combining Knowledge Distillation, Active Learning, and causal reasoning. In short, KT acts as the oracle in active learning by transforming knowledge from a teacher model to generate pseudo-labels for training a student model. To verify the validity of the method, we conducted simulation experiments with two setups: (1) using a less stable teacher model and (2) a relatively more stable teacher model. Results indicate that when a stable teacher model is given, the student model can eventually reach its expected maximum performance. KT is potentially beneficial for scenarios that meet the following circumstances: (1) when the teacher's task is generic, which means existing pre-trained models might be adequate for its task, so there will be no need to train the teacher model from scratch; and/or (2) when the label for the student's task is difficult or expensive to acquire.

Related papers

Model-to-Model Knowledge Transmission (M2KT): A Data-Free Framework for Cross-Model Understanding Transfer [0.0]
We introduce Model-to-Model Knowledge Transmission (M2KT), a novel paradigm for data-free conceptual transfer between neural networks.<n>Unlike classical distillation, M2KT operates primarily in concept space rather than example space.<n>M2KT can achieve approximately 85 to 90 percent of teacher performance while reducing data usage by over 98 percent compared to standard knowledge distillation.
arXiv Detail & Related papers (2025-11-19T09:43:25Z)
Matryoshka Model Learning for Improved Elastic Student Models [62.154536258259384]
MatTA is a framework for training multiple accurate Student models using a novel Teacher-TA-Student recipe.<n>We demonstrate our method on GPT-2 Medium, a public model, and achieve relative improvements of over 24% on SAT Math and over 10% on the LAMBADA benchmark.
arXiv Detail & Related papers (2025-05-29T10:54:58Z)
Toward In-Context Teaching: Adapting Examples to Students' Misconceptions [54.82965010592045]
We introduce a suite of models and evaluation methods we call AdapT. AToM is a new probabilistic model for adaptive teaching that jointly infers students' past beliefs and optimize for the correctness of future beliefs. Our results highlight both the difficulty of the adaptive teaching task and the potential of learned adaptive models for solving it.
arXiv Detail & Related papers (2024-05-07T17:05:27Z)
Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning. Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset. We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU) We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z)
PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [65.57123249246358]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.<n>On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.<n>On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models [10.941519846908697]
We introduce a new type of teacher model for connectionist temporal classification ( CTC)-based sequence models, namely Oracle Teacher. Since the Oracle Teacher learns a more accurate CTC alignment by referring to the target information, it can provide the student with more optimal guidance. Based on a many-to-one mapping property of the CTC algorithm, we present a training strategy that can effectively prevent the trivial solution.
arXiv Detail & Related papers (2021-11-05T14:14:05Z)
Certifiable Machine Unlearning for Linear Models [1.484852576248587]
Machine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted. We present an experimental study of the three state-of-the-art approximate unlearning methods for linear models.
arXiv Detail & Related papers (2021-06-29T05:05:58Z)
SLADE: A Self-Training Framework For Distance Metric Learning [75.54078592084217]
We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional unlabeled data. We first train a teacher model on the labeled data and use it to generate pseudo labels for the unlabeled data. We then train a student model on both labels and pseudo labels to generate final feature embeddings.
arXiv Detail & Related papers (2020-11-20T08:26:10Z)
From Learning to Meta-Learning: Reduced Training Overhead and Complexity for Communication Systems [40.427909614453526]
Machine learning methods adapt the parameters of a model, constrained to lie in a given model class, by using a fixed learning procedure based on data or active observations. With a meta-trained inductive bias, training of a machine learning model can be potentially carried out with reduced training data and/or time complexity. This paper provides a high-level introduction to meta-learning with applications to communication systems.
arXiv Detail & Related papers (2020-01-05T12:54:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.