Related papers: Inter-Region Affinity Distillation for Road Marking Segmentation

Inter-Region Affinity Distillation for Road Marking Segmentation

URL: http://arxiv.org/abs/2004.05304v1
Date: Sat, 11 Apr 2020 04:26:37 GMT
Title: Inter-Region Affinity Distillation for Road Marking Segmentation
Authors: Yuenan Hou, Zheng Ma, Chunxiao Liu, Tak-Wai Hui, Chen Change Loy
Abstract summary: We study the problem of distilling knowledge from a large deep teacher network to a much smaller student network. Our method is known as Inter-Region Affinity KD (IntRA-KD)
Score: 81.3619453527367
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of distilling knowledge from a large deep teacher network to a much smaller student network for the task of road marking segmentation. In this work, we explore a novel knowledge distillation (KD) approach that can transfer 'knowledge' on scene structure more effectively from a teacher to a student model. Our method is known as Inter-Region Affinity KD (IntRA-KD). It decomposes a given road scene image into different regions and represents each region as a node in a graph. An inter-region affinity graph is then formed by establishing pairwise relationships between nodes based on their similarity in feature distribution. To learn structural knowledge from the teacher network, the student is required to match the graph generated by the teacher. The proposed method shows promising results on three large-scale road marking segmentation benchmarks, i.e., ApolloScape, CULane and LLAMAS, by taking various lightweight models as students and ResNet-101 as the teacher. IntRA-KD consistently brings higher performance gains on all lightweight models, compared to previous distillation methods. Our code is available at https://github.com/cardwing/Codes-for-IntRA-KD.

Related papers

Improving Knowledge Distillation via Regularizing Feature Norm and Direction [16.98806338782858]
Knowledge distillation (KD) exploits a large well-trained model (i.e., teacher) to train a small student model on the same dataset for the same task. Treating teacher features as knowledge, prevailing methods of knowledge distillation train student by aligning its features with the teacher's, e.g., by minimizing the KL-divergence between their logits or L2 distance between their intermediate features. While it is natural to believe that better alignment of student features to the teacher better distills teacher knowledge, simply forcing this alignment does not directly contribute to the student's performance, e.g.
arXiv Detail & Related papers (2023-05-26T15:05:19Z)
DisWOT: Student Architecture Search for Distillation WithOut Training [0.0]
We explore a novel training-free framework to search for the best student architectures for a given teacher. Our work first empirically show that the optimal model under vanilla training cannot be the winner in distillation. Our experiments on CIFAR, ImageNet and NAS-Bench-201 demonstrate that our technique achieves state-of-the-art results on different search spaces.
arXiv Detail & Related papers (2023-03-28T01:58:45Z)
Improved knowledge distillation by utilizing backward pass knowledge in neural networks [17.437510399431606]
Knowledge distillation (KD) is one of the prominent techniques for model compression. In this work, we generate new auxiliary training samples based on extracting knowledge from the backward pass of the teacher. We show how this technique can be used successfully in applications of natural language processing (NLP) and language understanding.
arXiv Detail & Related papers (2023-01-27T22:07:38Z)
InDistill: Information flow-preserving knowledge distillation for model compression [20.88709060450944]
We introduce InDistill, a method that serves as a warmup stage for Knowledge Distillation (KD) effectiveness. InDistill focuses on transferring critical information flow paths from a heavyweight teacher to a lightweight student. The proposed method is extensively evaluated using various pairs of teacher-student architectures on CIFAR-10, CIFAR-100, and ImageNet datasets.
arXiv Detail & Related papers (2022-05-20T07:40:09Z)
Cross-Image Relational Knowledge Distillation for Semantic Segmentation [16.0341383592071]
Cross-Image KD (CIRK) focuses on transferring structured pixel-to-pixel and pixel-to-region relations among whole images. The motivation is that a good teacher network could construct a well-structured feature space in terms of global pixel dependencies. CIRK makes the student mimic better structured relations from the teacher, thus improving the segmentation performance.
arXiv Detail & Related papers (2022-04-14T14:24:19Z)
Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data [56.29595334715237]
Knowledge distillation(KD) aims to craft a compact student model that imitates the behavior of a pre-trained teacher in a target domain. We introduce a handy yet surprisingly efficacious approach, dubbed astextitMosaicKD. In MosaicKD, this is achieved through a four-player min-max game, in which a generator, a discriminator, a student network, are collectively trained in an adversarial manner.
arXiv Detail & Related papers (2021-10-27T13:01:10Z)
Graph Consistency based Mean-Teaching for Unsupervised Domain Adaptive Person Re-Identification [54.58165777717885]
This paper proposes a Graph Consistency based Mean-Teaching (GCMT) method with constructing the Graph Consistency Constraint (GCC) between teacher and student networks. Experiments on three datasets, i.e., Market-1501, DukeMTMCreID, and MSMT17, show that proposed GCMT outperforms state-of-the-art methods by clear margin.
arXiv Detail & Related papers (2021-05-11T04:09:49Z)
Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation. We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner. Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z)
Channel-wise Knowledge Distillation for Dense Prediction [73.99057249472735]
We propose to align features channel-wise between the student and teacher networks. We consistently achieve superior performance on three benchmarks with various network structures.
arXiv Detail & Related papers (2020-11-26T12:00:38Z)
Distilling Knowledge from Graph Convolutional Networks [146.71503336770886]
Existing knowledge distillation methods focus on convolutional neural networks (CNNs) We propose the first dedicated approach to distilling knowledge from a pre-trained graph convolutional network (GCN) model. We show that our method achieves the state-of-the-art knowledge distillation performance for GCN models.
arXiv Detail & Related papers (2020-03-23T18:23:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.