Deep Ensemble Collaborative Learning by using Knowledge-transfer Graph
for Fine-grained Object Classification
- URL: http://arxiv.org/abs/2103.14845v1
- Date: Sat, 27 Mar 2021 08:56:00 GMT
- Title: Deep Ensemble Collaborative Learning by using Knowledge-transfer Graph
for Fine-grained Object Classification
- Authors: Naoki Okamoto, Soma Minami, Tsubasa Hirakawa, Takayoshi Yamashita,
Hironobu Fujiyoshi
- Abstract summary: The performance of ensembles of networks that have undergone mutual learning does not improve significantly from that of normal ensembles without mutual learning.
This may be due to the relationship between the knowledge in mutual learning and the individuality of the networks in the ensemble.
We propose an ensemble method using knowledge transfer to improve the accuracy of ensembles by introducing a loss design that promotes diversity among networks in mutual learning.
- Score: 9.49864824780503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mutual learning, in which multiple networks learn by sharing their knowledge,
improves the performance of each network. However, the performance of ensembles
of networks that have undergone mutual learning does not improve significantly
from that of normal ensembles without mutual learning, even though the
performance of each network has improved significantly. This may be due to the
relationship between the knowledge in mutual learning and the individuality of
the networks in the ensemble. In this study, we propose an ensemble method
using knowledge transfer to improve the accuracy of ensembles by introducing a
loss design that promotes diversity among networks in mutual learning. We use
an attention map as knowledge, which represents the probability distribution
and information in the middle layer of a network. There are many ways to
combine networks and loss designs for knowledge transfer methods. Therefore, we
use the automatic optimization of knowledge-transfer graphs to consider a
variety of knowledge-transfer methods by graphically representing conventional
mutual-learning and distillation methods and optimizing each element through
hyperparameter search. The proposed method consists of a mechanism for
constructing an ensemble in a knowledge-transfer graph, attention loss, and a
loss design that promotes diversity among networks. We explore optimal ensemble
learning by optimizing a knowledge-transfer graph to maximize ensemble
accuracy. From exploration of graphs and evaluation experiments using the
datasets of Stanford Dogs, Stanford Cars, and CUB-200-2011, we confirm that the
proposed method is more accurate than a conventional ensemble method.
Related papers
- Distribution Shift Matters for Knowledge Distillation with Webly
Collected Images [91.66661969598755]
We propose a novel method dubbed Knowledge Distillation between Different Distributions" (KD$3$)
We first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network.
We also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment.
arXiv Detail & Related papers (2023-07-21T10:08:58Z) - Semantic Enhanced Knowledge Graph for Large-Scale Zero-Shot Learning [74.6485604326913]
We provide a new semantic enhanced knowledge graph that contains both expert knowledge and categories semantic correlation.
To propagate information on the knowledge graph, we propose a novel Residual Graph Convolutional Network (ResGCN)
Experiments conducted on the widely used large-scale ImageNet-21K dataset and AWA2 dataset show the effectiveness of our method.
arXiv Detail & Related papers (2022-12-26T13:18:36Z) - Personalized Decentralized Multi-Task Learning Over Dynamic
Communication Graphs [59.96266198512243]
We propose a decentralized and federated learning algorithm for tasks that are positively and negatively correlated.
Our algorithm uses gradients to calculate the correlations among tasks automatically, and dynamically adjusts the communication graph to connect mutually beneficial tasks and isolate those that may negatively impact each other.
We conduct experiments on a synthetic Gaussian dataset and a large-scale celebrity attributes (CelebA) dataset.
arXiv Detail & Related papers (2022-12-21T18:58:24Z) - Leveraging Different Learning Styles for Improved Knowledge Distillation
in Biomedical Imaging [0.9208007322096533]
Our work endeavors to leverage the concept of knowledge diversification to improve the performance of model compression techniques like Knowledge Distillation (KD) and Mutual Learning (ML)
We use a single-teacher and two-student network in a unified framework that not only allows for the transfer of knowledge from teacher to students (KD) but also encourages collaborative learning between students (ML)
Unlike the conventional approach, where the teacher shares the same knowledge in the form of predictions or feature representations with the student network, our proposed approach employs a more diversified strategy by training one student with predictions and the other with feature maps from the teacher.
arXiv Detail & Related papers (2022-12-06T12:40:45Z) - Learning Knowledge Representation with Meta Knowledge Distillation for
Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task.
Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z) - Distilling Holistic Knowledge with Graph Neural Networks [37.86539695906857]
Knowledge Distillation (KD) aims at transferring knowledge from a larger well-optimized teacher network to a smaller learnable student network.
Existing KD methods have mainly considered two types of knowledge, namely the individual knowledge and the relational knowledge.
We propose to distill the novel holistic knowledge based on an attributed graph constructed among instances.
arXiv Detail & Related papers (2021-08-12T02:47:59Z) - LENAS: Learning-based Neural Architecture Search and Ensemble for 3D Radiotherapy Dose Prediction [42.38793195337463]
We propose a novel learning-based ensemble approach named LENAS, which integrates neural architecture search with knowledge distillation for 3D radiotherapy dose prediction.
Our approach starts by exhaustively searching each block from an enormous architecture space to identify multiple architectures that exhibit promising performance.
To mitigate the complexity introduced by the model ensemble, we adopt the teacher-student paradigm, leveraging the diverse outputs from multiple learned networks as supervisory signals.
arXiv Detail & Related papers (2021-06-12T10:08:52Z) - All at Once Network Quantization via Collaborative Knowledge Transfer [56.95849086170461]
We develop a novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network.
Specifically, we propose an adaptive selection strategy to choose a high-precision enquoteteacher for transferring knowledge to the low-precision student.
To effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
arXiv Detail & Related papers (2021-03-02T03:09:03Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Multivariate Relations Aggregation Learning in Social Networks [39.576490107740135]
In graph learning tasks of social networks, the identification and utilization of multivariate relationship information are more important.
Existing graph learning methods are based on the neighborhood information diffusion mechanism.
This paper proposes the multivariate relationship aggregation learning (MORE) method, which can effectively capture the multivariate relationship information in the network environment.
arXiv Detail & Related papers (2020-08-09T04:58:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.