Related papers: torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation

torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation

URL: http://arxiv.org/abs/2011.12913v2
Date: Wed, 27 Jan 2021 19:13:21 GMT
Title: torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation
Authors: Yoshitomo Matsubara
Abstract summary: We present our developed open-source framework built on PyTorch and dedicated for knowledge distillation studies. The framework is designed to enable users to design experiments by declarative PyYAML configuration files. We reproduce some of their original experimental results on the ImageNet and COCO datasets presented at major machine learning conferences.
Score: 1.8579693774597703
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While knowledge distillation (transfer) has been attracting attentions from the research community, the recent development in the fields has heightened the need for reproducible studies and highly generalized frameworks to lower barriers to such high-quality, reproducible deep learning research. Several researchers voluntarily published frameworks used in their knowledge distillation studies to help other interested researchers reproduce their original work. Such frameworks, however, are usually neither well generalized nor maintained, thus researchers are still required to write a lot of code to refactor/build on the frameworks for introducing new methods, models, datasets and designing experiments. In this paper, we present our developed open-source framework built on PyTorch and dedicated for knowledge distillation studies. The framework is designed to enable users to design experiments by declarative PyYAML configuration files, and helps researchers complete the recently proposed ML Code Completeness Checklist. Using the developed framework, we demonstrate its various efficient training strategies, and implement a variety of knowledge distillation methods. We also reproduce some of their original experimental results on the ImageNet and COCO datasets presented at major machine learning conferences such as ICLR, NeurIPS, CVPR and ECCV, including recent state-of-the-art methods. All the source code, configurations, log files and trained model weights are publicly available at https://github.com/yoshitomo-matsubara/torchdistill .

Related papers

MLGym: A New Framework and Benchmark for Advancing AI Research Agents [51.9387884953294]
We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing large language models on AI research tasks. This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement learning (RL) algorithms for training such agents. We evaluate a number of frontier large language models (LLMs) on our benchmarks such as Claude-3.5-Sonnet, Llama-3.1 405B, GPT-4o, o1-preview, and Gemini-1.5 Pro.
arXiv Detail & Related papers (2025-02-20T12:28:23Z)
Mixture of Knowledge Minigraph Agents for Literature Review Generation [22.80918934436901]
This paper proposes a novel framework, collaborative knowledge minigraph agents (CKMAs) to automate scholarly literature reviews. A novel prompt-based algorithm, the knowledge minigraph construction agent (KMCA), is designed to identify relations between concepts from academic literature and automatically constructs knowledge minigraphs. By leveraging the capabilities of large language models on constructed knowledge minigraphs, the multiple path summarization agent (MPSA) efficiently organizes concepts and relations from different viewpoints to generate literature review paragraphs.
arXiv Detail & Related papers (2024-11-09T12:06:40Z)
Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored. We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches. We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z)
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning [136.89318317245855]
MoErging aims to recycle expert models to create an aggregate system with improved performance or generalization. A key component of MoErging methods is the creation of a router that decides which expert model(s) to use for a particular input or application. This survey includes a novel taxonomy for cataloging key design choices and clarifying suitable applications for each method.
arXiv Detail & Related papers (2024-08-13T17:49:00Z)
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML) This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature. The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z)
Leveraging Large Language Models for Semantic Query Processing in a Scholarly Knowledge Graph [1.7418328181959968]
The proposed research aims to develop an innovative semantic query processing system. It enables users to obtain comprehensive information about research works produced by Computer Science (CS) researchers at the Australian National University.
arXiv Detail & Related papers (2024-05-24T09:19:45Z)
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows [72.40917624485822]
We introduce DataDreamer, an open source Python library that allows researchers to implement powerful large language models. DataDreamer also helps researchers adhere to best practices that we propose to encourage open science.
arXiv Detail & Related papers (2024-02-16T00:10:26Z)
A Reliable Knowledge Processing Framework for Combustion Science using Foundation Models [0.0]
The study introduces an approach to process diverse combustion research data, spanning experimental studies, simulations, and literature. The developed approach minimizes computational and economic expenses while optimizing data privacy and accuracy. The framework consistently delivers accurate domain-specific responses with minimal human oversight.
arXiv Detail & Related papers (2023-12-31T17:15:25Z)
torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP [3.0875505950565856]
We present a significantly upgraded version of torchdistill, a modular-driven coding-free deep learning framework. We reproduce the GLUE benchmark results of BERT models using a script based on the upgraded torchdistill. All the 27 fine-tuned BERT models and configurations to reproduce the results are published at Hugging Face.
arXiv Detail & Related papers (2023-10-26T17:57:15Z)
Diffusion-based Visual Counterfactual Explanations -- Towards Systematic Quantitative Evaluation [64.0476282000118]
Latest methods for visual counterfactual explanations (VCE) harness the power of deep generative models to synthesize new examples of high-dimensional images of impressive quality. It is currently difficult to compare the performance of these VCE methods as the evaluation procedures largely vary and often boil down to visual inspection of individual examples and small scale user studies. We propose a framework for systematic, quantitative evaluation of the VCE methods and a minimal set of metrics to be used.
arXiv Detail & Related papers (2023-08-11T12:22:37Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
A Selective Survey on Versatile Knowledge Distillation Paradigm for Neural Network Models [3.770437296936382]
We review the characteristics of knowledge distillation from the hypothesis that the three important ingredients of knowledge distillation are distilled knowledge and loss, teacher-student paradigm, and the distillation process. We present some future works in knowledge distillation including explainable knowledge distillation where the analytical analysis of the performance gain is studied and the self-supervised learning which is a hot research topic in deep learning community.
arXiv Detail & Related papers (2020-11-30T05:22:02Z)
dagger: A Python Framework for Reproducible Machine Learning Experiment Orchestration [0.913755431537592]
Multi-stage experiments in machine learning often involve state-mutating operations acting on models along multiple paths of execution. We present dagger, a framework to facilitate reproducible and reusable experiment orchestration.
arXiv Detail & Related papers (2020-06-12T21:42:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.