torchdistill: A Modular, Configuration-Driven Framework for Knowledge
Distillation
- URL: http://arxiv.org/abs/2011.12913v2
- Date: Wed, 27 Jan 2021 19:13:21 GMT
- Title: torchdistill: A Modular, Configuration-Driven Framework for Knowledge
Distillation
- Authors: Yoshitomo Matsubara
- Abstract summary: We present our developed open-source framework built on PyTorch and dedicated for knowledge distillation studies.
The framework is designed to enable users to design experiments by declarative PyYAML configuration files.
We reproduce some of their original experimental results on the ImageNet and COCO datasets presented at major machine learning conferences.
- Score: 1.8579693774597703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While knowledge distillation (transfer) has been attracting attentions from
the research community, the recent development in the fields has heightened the
need for reproducible studies and highly generalized frameworks to lower
barriers to such high-quality, reproducible deep learning research. Several
researchers voluntarily published frameworks used in their knowledge
distillation studies to help other interested researchers reproduce their
original work. Such frameworks, however, are usually neither well generalized
nor maintained, thus researchers are still required to write a lot of code to
refactor/build on the frameworks for introducing new methods, models, datasets
and designing experiments. In this paper, we present our developed open-source
framework built on PyTorch and dedicated for knowledge distillation studies.
The framework is designed to enable users to design experiments by declarative
PyYAML configuration files, and helps researchers complete the recently
proposed ML Code Completeness Checklist. Using the developed framework, we
demonstrate its various efficient training strategies, and implement a variety
of knowledge distillation methods. We also reproduce some of their original
experimental results on the ImageNet and COCO datasets presented at major
machine learning conferences such as ICLR, NeurIPS, CVPR and ECCV, including
recent state-of-the-art methods. All the source code, configurations, log files
and trained model weights are publicly available at
https://github.com/yoshitomo-matsubara/torchdistill .
Related papers
- Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning [136.89318317245855]
MoErging aims to recycle expert models to create an aggregate system with improved performance or generalization.
A key component of MoErging methods is the creation of a router that decides which expert model(s) to use for a particular input or application.
This survey includes a novel taxonomy for cataloging key design choices and clarifying suitable applications for each method.
arXiv Detail & Related papers (2024-08-13T17:49:00Z) - Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - Leveraging Large Language Models for Semantic Query Processing in a Scholarly Knowledge Graph [1.7418328181959968]
The proposed research aims to develop an innovative semantic query processing system.
It enables users to obtain comprehensive information about research works produced by Computer Science (CS) researchers at the Australian National University.
arXiv Detail & Related papers (2024-05-24T09:19:45Z) - DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows [72.40917624485822]
We introduce DataDreamer, an open source Python library that allows researchers to implement powerful large language models.
DataDreamer also helps researchers adhere to best practices that we propose to encourage open science.
arXiv Detail & Related papers (2024-02-16T00:10:26Z) - A Reliable Knowledge Processing Framework for Combustion Science using
Foundation Models [0.0]
The study introduces an approach to process diverse combustion research data, spanning experimental studies, simulations, and literature.
The developed approach minimizes computational and economic expenses while optimizing data privacy and accuracy.
The framework consistently delivers accurate domain-specific responses with minimal human oversight.
arXiv Detail & Related papers (2023-12-31T17:15:25Z) - torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free
Deep Learning Studies: A Case Study on NLP [3.0875505950565856]
We present a significantly upgraded version of torchdistill, a modular-driven coding-free deep learning framework.
We reproduce the GLUE benchmark results of BERT models using a script based on the upgraded torchdistill.
All the 27 fine-tuned BERT models and configurations to reproduce the results are published at Hugging Face.
arXiv Detail & Related papers (2023-10-26T17:57:15Z) - Diffusion-based Visual Counterfactual Explanations -- Towards Systematic
Quantitative Evaluation [64.0476282000118]
Latest methods for visual counterfactual explanations (VCE) harness the power of deep generative models to synthesize new examples of high-dimensional images of impressive quality.
It is currently difficult to compare the performance of these VCE methods as the evaluation procedures largely vary and often boil down to visual inspection of individual examples and small scale user studies.
We propose a framework for systematic, quantitative evaluation of the VCE methods and a minimal set of metrics to be used.
arXiv Detail & Related papers (2023-08-11T12:22:37Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - A Selective Survey on Versatile Knowledge Distillation Paradigm for
Neural Network Models [3.770437296936382]
We review the characteristics of knowledge distillation from the hypothesis that the three important ingredients of knowledge distillation are distilled knowledge and loss, teacher-student paradigm, and the distillation process.
We present some future works in knowledge distillation including explainable knowledge distillation where the analytical analysis of the performance gain is studied and the self-supervised learning which is a hot research topic in deep learning community.
arXiv Detail & Related papers (2020-11-30T05:22:02Z) - dagger: A Python Framework for Reproducible Machine Learning Experiment
Orchestration [0.913755431537592]
Multi-stage experiments in machine learning often involve state-mutating operations acting on models along multiple paths of execution.
We present dagger, a framework to facilitate reproducible and reusable experiment orchestration.
arXiv Detail & Related papers (2020-06-12T21:42:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.