Related papers: ACT-Net: Asymmetric Co-Teacher Network for Semi-supervised Memory-efficient Medical Image Segmentation

ACT-Net: Asymmetric Co-Teacher Network for Semi-supervised Memory-efficient Medical Image Segmentation

URL: http://arxiv.org/abs/2207.01900v1
Date: Tue, 5 Jul 2022 08:58:15 GMT
Title: ACT-Net: Asymmetric Co-Teacher Network for Semi-supervised Memory-efficient Medical Image Segmentation
Authors: Ziyuan Zhao, Andong Zhu, Zeng Zeng, Bharadwaj Veeravalli, Cuntai Guan
Abstract summary: High-accuracy deep models usually come in large model sizes, limiting their employment in real scenarios. We propose a novel asymmetric co-teacher framework, ACT-Net, to alleviate the burden on both expensive annotations and computational costs for semi-student knowledge distillation.
Score: 19.528360162691342
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While deep models have shown promising performance in medical image segmentation, they heavily rely on a large amount of well-annotated data, which is difficult to access, especially in clinical practice. On the other hand, high-accuracy deep models usually come in large model sizes, limiting their employment in real scenarios. In this work, we propose a novel asymmetric co-teacher framework, ACT-Net, to alleviate the burden on both expensive annotations and computational costs for semi-supervised knowledge distillation. We advance teacher-student learning with a co-teacher network to facilitate asymmetric knowledge distillation from large models to small ones by alternating student and teacher roles, obtaining tiny but accurate models for clinical employment. To verify the effectiveness of our ACT-Net, we employ the ACDC dataset for cardiac substructure segmentation in our experiments. Extensive experimental results demonstrate that ACT-Net outperforms other knowledge distillation methods and achieves lossless segmentation performance with 250x fewer parameters.

Related papers

Structured Agent Distillation for Large Language Model [58.22497891295258]
We propose Structured Agent Distillation, a framework that compresses large LLM-based agents into smaller student models.<n>Our method segments trajectories into [REASON] and [ACT] spans, applying segment-specific losses to align each component with the teacher's behavior.<n>Experiments on ALFWorld, HotPotQA-ReAct, and WebShop show that our approach consistently outperforms token-level and imitation learning baselines.
arXiv Detail & Related papers (2025-05-20T02:01:55Z)
Task-Specific Knowledge Distillation from the Vision Foundation Model for Enhanced Medical Image Segmentation [13.018234326432964]
We propose a novel and generalizable task-specific knowledge distillation framework for medical image segmentation. Our method fine-tunes the VFM on the target segmentation task to capture task-specific features before distilling the knowledge to smaller models. Experimental results across five medical image datasets demonstrate that our method consistently outperforms task-agnostic knowledge distillation.
arXiv Detail & Related papers (2025-03-10T06:39:53Z)
Active Data Curation Effectively Distills Large-Scale Multimodal Models [66.23057263509027]
Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. In this work we explore an alternative, yet simple approach -- active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations.
arXiv Detail & Related papers (2024-11-27T18:50:15Z)
Faithful Label-free Knowledge Distillation [8.572967695281054]
This paper presents a label-free knowledge distillation approach called Teacher in the Middle (TinTeM) It produces a more faithful student, which better replicates the behavior of the teacher network across a range of benchmarks testing model robustness, generalisability and out-of-distribution detection.
arXiv Detail & Related papers (2024-11-22T01:48:44Z)
Multi-Task Multi-Scale Contrastive Knowledge Distillation for Efficient Medical Image Segmentation [0.0]
This thesis aims to investigate the feasibility of knowledge transfer between neural networks for medical image segmentation tasks. In the context of medical imaging, where the data volumes are often limited, leveraging knowledge from a larger pre-trained network could be useful.
arXiv Detail & Related papers (2024-06-05T12:06:04Z)
TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation [6.856317526681759]
Visual place recognition plays a pivotal role in autonomous exploration and navigation of mobile robots. Existing methods overcome this by exploiting powerful yet large networks. We propose a high-performance teacher and lightweight student distillation framework called TSCM.
arXiv Detail & Related papers (2024-04-02T02:29:41Z)
Knowledge Distillation for Adaptive MRI Prostate Segmentation Based on Limit-Trained Multi-Teacher Models [4.711401719735324]
Knowledge Distillation (KD) has been proposed as a compression method and an acceleration technology. KD is an efficient learning strategy that can transfer knowledge from a burdensome model to a lightweight model. We develop a KD-based deep model for prostate MRI segmentation in this work by combining features-based distillation with Kullback-Leibler divergence, Lovasz, and Dice losses.
arXiv Detail & Related papers (2023-03-16T17:15:08Z)
EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR) We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model. We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z)
SECP-Net: SE-Connection Pyramid Network of Organ At Risk Segmentation for Nasopharyngeal Carcinoma [0.0]
Deep learning models have been widely applied in medical image segmentation tasks. Traditional deep neural networks underperform during segmentation due to the lack use of global and multi-size information. This paper proposes a new SE-Connection Pyramid Network (SECP-Net) for improving the segmentation performance.
arXiv Detail & Related papers (2021-12-28T07:48:18Z)
Knowledge distillation: A good teacher is patient and consistent [71.14922743774864]
There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications. We identify certain implicit design choices, which may drastically affect the effectiveness of distillation. We obtain a state-of-the-art ResNet-50 model for ImageNet, which achieves 82.8% top-1 accuracy.
arXiv Detail & Related papers (2021-06-09T17:20:40Z)
DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning [94.89221799550593]
Self-supervised representation learning (SSL) has received widespread attention from the community. Recent research argue that its performance will suffer a cliff fall when the model size decreases. We propose a simple yet effective Distilled Contrastive Learning (DisCo) to ease the issue by a large margin.
arXiv Detail & Related papers (2021-04-19T08:22:52Z)
MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability. We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error. Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)
Contrastive Distillation on Intermediate Representations for Language Model Compression [89.31786191358802]
We propose Contrastive Distillation on Intermediate Representations (CoDIR) as a principled knowledge distillation framework. By learning to distinguish positive sample from a large set of negative samples, CoDIR facilitates the student's exploitation of rich information in teacher's hidden layers. CoDIR can be readily applied to compress large-scale language models in both pre-training and finetuning stages, and achieves superb performance on the GLUE benchmark.
arXiv Detail & Related papers (2020-09-29T17:31:43Z)
Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model [57.41841346459995]
We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner. We propose an approach that blends mixup and active learning.
arXiv Detail & Related papers (2020-03-31T05:44:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.