Related papers: Exploring Knowledge Distillation of a Deep Neural Network for Multi-Script identification

Exploring Knowledge Distillation of a Deep Neural Network for Multi-Script identification

URL: http://arxiv.org/abs/2102.10335v1
Date: Sat, 20 Feb 2021 12:54:07 GMT
Title: Exploring Knowledge Distillation of a Deep Neural Network for Multi-Script identification
Authors: Shuvayan Ghosh Dastidar, Kalpita Dutta, Nibaran Das, Mahantapas Kundu and Mita Nasipuri
Abstract summary: Multi-lingual script identification is a difficult task consisting of different language with complex backgrounds in scene text images. Deep neural networks are employed as teacher models to train a smaller student network by utilizing the teacher model's predictions.
Score: 8.72467690936929
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-lingual script identification is a difficult task consisting of different language with complex backgrounds in scene text images. According to the current research scenario, deep neural networks are employed as teacher models to train a smaller student network by utilizing the teacher model's predictions. This process is known as dark knowledge transfer. It has been quite successful in many domains where the final result obtained is unachievable through directly training the student network with a simple architecture. In this paper, we explore dark knowledge transfer approach using long short-term memory(LSTM) and CNN based assistant model and various deep neural networks as the teacher model, with a simple CNN based student network, in this domain of multi-script identification from natural scene text images. We explore the performance of different teacher models and their ability to transfer knowledge to a student network. Although the small student network's limited size, our approach obtains satisfactory results on a well-known script identification dataset CVSI-2015.

Related papers

Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
UNIKD: UNcertainty-filtered Incremental Knowledge Distillation for Neural Implicit Representation [48.49860868061573]
Recent neural implicit representations (NIRs) have achieved great success in the tasks of 3D reconstruction and novel view synthesis. They require the images of a scene from different camera views to be available for one-time training. This is expensive especially for scenarios with large-scale scenes and limited data storage. We design a student-teacher framework to mitigate the catastrophic problem.
arXiv Detail & Related papers (2022-12-21T11:43:20Z)
Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks [28.215073725175728]
We propose a novel framework that leverages both multi-teacher knowledge distillation and network quantization for learning low bit-width DNNs. Our experimental results on CIFAR100 and ImageNet datasets show that the compact quantized student models trained with our method achieve competitive results.
arXiv Detail & Related papers (2022-10-27T01:03:39Z)
The Multiple Subnetwork Hypothesis: Enabling Multidomain Learning by Isolating Task-Specific Subnetworks in Feedforward Neural Networks [0.0]
We identify a methodology and network representational structure which allows a pruned network to employ previously unused weights to learn subsequent tasks. We show that networks trained using our approaches are able to learn multiple tasks, which may be related or unrelated, in parallel or in sequence without sacrificing performance on any task or exhibiting catastrophic forgetting.
arXiv Detail & Related papers (2022-07-18T15:07:13Z)
Self-Supervised Graph Neural Network for Multi-Source Domain Adaptation [51.21190751266442]
Domain adaptation (DA) tries to tackle the scenarios when the test data does not fully follow the same distribution of the training data. By learning from large-scale unlabeled samples, self-supervised learning has now become a new trend in deep learning. We propose a novel textbfSelf-textbfSupervised textbfGraph Neural Network (SSG) to enable more effective inter-task information exchange and knowledge sharing.
arXiv Detail & Related papers (2022-04-08T03:37:56Z)
Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation. We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner. Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z)
Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks. First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts. Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z)
Point Adversarial Self Mining: A Simple Method for Facial Expression Recognition [79.75964372862279]
We propose Point Adversarial Self Mining (PASM) to improve the recognition accuracy in facial expression recognition. PASM uses a point adversarial attack method and a trained teacher network to locate the most informative position related to the target task. The adaptive learning materials generation and teacher/student update can be conducted more than one time, improving the network capability iteratively.
arXiv Detail & Related papers (2020-08-26T06:39:24Z)
Teacher-Class Network: A Neural Network Compression Mechanism [2.257416403770908]
Instead of transferring knowledge to one student only, the proposed method transfers a chunk of knowledge to each student. Our students are not trained for problem-specific logits, they are trained to mimic knowledge (dense representation) learned by the teacher network. The proposed teacher-class architecture is evaluated on several benchmark datasets such as MNIST, Fashion MNIST, IMDB Movie Reviews, CAMVid, CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2020-04-07T11:31:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.