Distilling Realizable Students from Unrealizable Teachers
- URL: http://arxiv.org/abs/2505.09546v1
- Date: Wed, 14 May 2025 16:45:51 GMT
- Title: Distilling Realizable Students from Unrealizable Teachers
- Authors: Yujin Kim, Nathaniel Chin, Arnav Vasudev, Sanjiban Choudhury,
- Abstract summary: We study policy distillation under privileged information, where a student policy with only partial observations must learn from a teacher with full-state access.<n>Existing approaches either modify the teacher to produce realizable but sub-optimal demonstrations or rely on the student to explore missing information independently.<n>We introduce two methods: (i) an imitation learning approach that adaptively determines when the student should query the teacher for corrections, and (ii) a reinforcement learning approach that selects where to initialize training for efficient exploration.
- Score: 9.968083244726941
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We study policy distillation under privileged information, where a student policy with only partial observations must learn from a teacher with full-state access. A key challenge is information asymmetry: the student cannot directly access the teacher's state space, leading to distributional shifts and policy degradation. Existing approaches either modify the teacher to produce realizable but sub-optimal demonstrations or rely on the student to explore missing information independently, both of which are inefficient. Our key insight is that the student should strategically interact with the teacher --querying only when necessary and resetting from recovery states --to stay on a recoverable path within its own observation space. We introduce two methods: (i) an imitation learning approach that adaptively determines when the student should query the teacher for corrections, and (ii) a reinforcement learning approach that selects where to initialize training for efficient exploration. We validate our methods in both simulated and real-world robotic tasks, demonstrating significant improvements over standard teacher-student baselines in training efficiency and final performance. The project website is available at : https://portal-cornell.github.io/CritiQ_ReTRy/
Related papers
- When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets? [0.0]
We present our submission to the BabyLM challenge, aiming to push the boundaries of data-efficient language model pretraining.
We address the limitation of treating students equally by formulating weighted mutual learning as a bi-level optimization problem.
Our evaluations show that teacher-less methods can match or surpass teacher-supervised approaches.
arXiv Detail & Related papers (2024-11-25T15:25:31Z) - Toward In-Context Teaching: Adapting Examples to Students' Misconceptions [54.82965010592045]
We introduce a suite of models and evaluation methods we call AdapT.
AToM is a new probabilistic model for adaptive teaching that jointly infers students' past beliefs and optimize for the correctness of future beliefs.
Our results highlight both the difficulty of the adaptive teaching task and the potential of learned adaptive models for solving it.
arXiv Detail & Related papers (2024-05-07T17:05:27Z) - Learn to Teach: Sample-Efficient Privileged Learning for Humanoid Locomotion over Diverse Terrains [6.967583364984562]
This work proposes a novel one-stage training framework-Learn to Teach (L2T)-which unifies teacher and student policy learning.<n>Our approach recycles simulator samples and synchronizes the learning trajectories through shared dynamics, significantly reducing sample complexities and training time.<n>We validate the RL variant (L2T-RL) through extensive simulations and hardware tests on the Digit robot, demonstrating zero-shot sim-to-real transfer and robust performance over 12+ challenging terrains without depth estimation modules.
arXiv Detail & Related papers (2024-02-09T21:16:43Z) - YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework.
It emulates the teacher-student education process to improve the efficacy of model fine-tuning.
Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z) - DriveAdapter: Breaking the Coupling Barrier of Perception and Planning
in End-to-End Autonomous Driving [64.57963116462757]
State-of-the-art methods usually follow the Teacher-Student' paradigm.
Student model only has access to raw sensor data and conducts behavior cloning on the data collected by the teacher model.
We propose DriveAdapter, which employs adapters with the feature alignment objective function between the student (perception) and teacher (planning) modules.
arXiv Detail & Related papers (2023-08-01T09:21:53Z) - Can Language Models Teach Weaker Agents? Teacher Explanations Improve
Students via Personalization [84.86241161706911]
We show that teacher LLMs can indeed intervene on student reasoning to improve their performance.
We also demonstrate that in multi-turn interactions, teacher explanations generalize and learn from explained data.
We verify that misaligned teachers can lower student performance to random chance by intentionally misleading them.
arXiv Detail & Related papers (2023-06-15T17:27:20Z) - Random Teachers are Good Teachers [19.74244993871716]
We investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation.
When distilling a student into such a random teacher, we observe a strong improvement of the distilled student over its teacher in terms of probing accuracy.
arXiv Detail & Related papers (2023-02-23T15:26:08Z) - Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z) - Teaching What You Should Teach: A Data-Based Distillation Method [20.595460553747163]
We introduce the "Teaching what you Should Teach" strategy into a knowledge distillation framework.
We propose a data-based distillation method named "TST" that searches for desirable augmented samples to assist in distilling more efficiently and rationally.
To be specific, we design a neural network-based data augmentation module with priori bias, which assists in finding what meets the teacher's strengths but the student's weaknesses.
arXiv Detail & Related papers (2022-12-11T06:22:14Z) - Switchable Online Knowledge Distillation [68.2673580932132]
Online Knowledge Distillation (OKD) improves involved models by reciprocally exploiting the difference between teacher and student.
We propose Switchable Online Knowledge Distillation (SwitOKD) to answer these questions.
arXiv Detail & Related papers (2022-09-12T03:03:40Z) - Know Thy Student: Interactive Learning with Gaussian Processes [11.641731210416102]
Our work proposes a simple diagnosis algorithm which uses Gaussian processes for inferring student-related information, before constructing a teaching dataset.
We study this in the offline reinforcement learning setting where the teacher must provide demonstrations to the student and avoid sending redundant trajectories.
Our experiments highlight the importance of diagosing before teaching and demonstrate how students can learn more efficiently with the help of an interactive teacher.
arXiv Detail & Related papers (2022-04-26T04:43:57Z) - The Wits Intelligent Teaching System: Detecting Student Engagement
During Lectures Using Convolutional Neural Networks [0.30458514384586394]
The Wits Intelligent Teaching System (WITS) aims to assist lecturers with real-time feedback regarding student affect.
A CNN based on AlexNet is successfully trained and which significantly outperforms a Support Vector Machine approach.
arXiv Detail & Related papers (2021-05-28T12:59:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.