Learn More for Food Recognition via Progressive Self-Distillation
- URL: http://arxiv.org/abs/2303.05073v2
- Date: Tue, 15 Aug 2023 08:27:26 GMT
- Title: Learn More for Food Recognition via Progressive Self-Distillation
- Authors: Yaohui Zhu, Linhu Liu, Jiang Tian
- Abstract summary: We propose a Progressive Self-Distillation (PSD) method for food recognition.
By using progressive training, the teacher network improves its ability to mine more incrementally discriminative regions.
In inference phase, only the teacher network is used without the help of the student network.
- Score: 12.046694471161262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Food recognition has a wide range of applications, such as health-aware
recommendation and self-service restaurants. Most previous methods of food
recognition firstly locate informative regions in some weakly-supervised
manners and then aggregate their features. However, location errors of
informative regions limit the effectiveness of these methods to some extent.
Instead of locating multiple regions, we propose a Progressive
Self-Distillation (PSD) method, which progressively enhances the ability of
network to mine more details for food recognition. The training of PSD
simultaneously contains multiple self-distillations, in which a teacher network
and a student network share the same embedding network. Since the student
network receives a modified image from its teacher network by masking some
informative regions, the teacher network outputs stronger semantic
representations than the student network. Guided by such teacher network with
stronger semantics, the student network is encouraged to mine more useful
regions from the modified image by enhancing its own ability. The ability of
the teacher network is also enhanced with the shared embedding network. By
using progressive training, the teacher network incrementally improves its
ability to mine more discriminative regions. In inference phase, only the
teacher network is used without the help of the student network. Extensive
experiments on three datasets demonstrate the effectiveness of our proposed
method and state-of-the-art performance.
Related papers
- Direct Distillation between Different Domains [97.39470334253163]
We propose a new one-stage method dubbed Direct Distillation between Different Domains" (4Ds)
We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge.
We then build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network.
arXiv Detail & Related papers (2024-01-12T02:48:51Z) - Cross Architecture Distillation for Face Recognition [49.55061794917994]
We develop an Adaptable Prompting Teacher network (APT) that integrates prompts into the teacher, enabling it to manage distillation-specific knowledge.
Experiments on popular face benchmarks and two large-scale verification sets demonstrate the superiority of our method.
arXiv Detail & Related papers (2023-06-26T12:54:28Z) - ORC: Network Group-based Knowledge Distillation using Online Role Change [3.735965959270874]
We propose an online role change strategy for multiple teacher-based knowledge distillations.
The top-ranked networks in the student group are able to promote to the teacher group at every iteration.
We verify the superiority of the proposed method on CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2022-06-01T10:28:18Z) - Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z) - Fixing the Teacher-Student Knowledge Discrepancy in Distillation [72.4354883997316]
We propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student.
Our method is very flexible that can be easily combined with other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-31T06:52:20Z) - Densely Guided Knowledge Distillation using Multiple Teacher Assistants [5.169724825219126]
We propose a densely guided knowledge distillation using multiple teacher assistants that gradually decreases the model size.
We also design teaching where, for each mini-batch, a teacher or teacher assistants are randomly dropped.
This acts as a regularizer to improve the efficiency of teaching of the student network.
arXiv Detail & Related papers (2020-09-18T13:12:52Z) - Point Adversarial Self Mining: A Simple Method for Facial Expression
Recognition [79.75964372862279]
We propose Point Adversarial Self Mining (PASM) to improve the recognition accuracy in facial expression recognition.
PASM uses a point adversarial attack method and a trained teacher network to locate the most informative position related to the target task.
The adaptive learning materials generation and teacher/student update can be conducted more than one time, improving the network capability iteratively.
arXiv Detail & Related papers (2020-08-26T06:39:24Z) - Interactive Knowledge Distillation [79.12866404907506]
We propose an InterActive Knowledge Distillation scheme to leverage the interactive teaching strategy for efficient knowledge distillation.
In the distillation process, the interaction between teacher and student networks is implemented by a swapping-in operation.
Experiments with typical settings of teacher-student networks demonstrate that the student networks trained by our IAKD achieve better performance than those trained by conventional knowledge distillation methods.
arXiv Detail & Related papers (2020-07-03T03:22:04Z) - Teacher-Class Network: A Neural Network Compression Mechanism [2.257416403770908]
Instead of transferring knowledge to one student only, the proposed method transfers a chunk of knowledge to each student.
Our students are not trained for problem-specific logits, they are trained to mimic knowledge (dense representation) learned by the teacher network.
The proposed teacher-class architecture is evaluated on several benchmark datasets such as MNIST, Fashion MNIST, IMDB Movie Reviews, CAMVid, CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2020-04-07T11:31:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.