Related papers: Optimizing Parking Space Classification: Distilling Ensembles into Lightweight Classifiers

Optimizing Parking Space Classification: Distilling Ensembles into Lightweight Classifiers

URL: http://arxiv.org/abs/2410.14705v1
Date: Mon, 07 Oct 2024 20:29:42 GMT
Title: Optimizing Parking Space Classification: Distilling Ensembles into Lightweight Classifiers
Authors: Paulo Luza Alves, André Hochuli, Luiz Eduardo de Oliveira, Paulo Lisboa de Almeida,
Abstract summary: We propose a robust ensemble of classifiers to serve as Teacher models in image-based parking space classification. These Teacher models are distilled into lightweight and specialized Student models that can be deployed directly on edge devices. Our results show that the Student models, with 26 times fewer parameters than the Teacher models, achieved an average accuracy of 96.6% on the target test datasets.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When deploying large-scale machine learning models for smart city applications, such as image-based parking lot monitoring, data often must be sent to a central server to perform classification tasks. This is challenging for the city's infrastructure, where image-based applications require transmitting large volumes of data, necessitating complex network and hardware infrastructures to process the data. To address this issue in image-based parking space classification, we propose creating a robust ensemble of classifiers to serve as Teacher models. These Teacher models are distilled into lightweight and specialized Student models that can be deployed directly on edge devices. The knowledge is distilled to the Student models through pseudo-labeled samples generated by the Teacher model, which are utilized to fine-tune the Student models on the target scenario. Our results show that the Student models, with 26 times fewer parameters than the Teacher models, achieved an average accuracy of 96.6% on the target test datasets, surpassing the Teacher models, which attained an average accuracy of 95.3%.

Related papers

Efficient Face Image Quality Assessment via Self-training and Knowledge Distillation [51.43664253596246]
Face image quality assessment (FIQA) is essential for various face-related applications.<n>We aim to develop a computationally efficient FIQA method that can be easily deployed in real-world applications.
arXiv Detail & Related papers (2025-07-21T15:17:01Z)
Knowledge Distillation for Enhancing Walmart E-commerce Search Relevance Using Large Language Models [6.324684465674387]
Large language models (LLMs) offer superior ranking capabilities, but it is challenging to deploy in real-time systems due to the high-latency requirements.<n>We propose a novel framework that distills a high performing LLM into a more efficient, low-latency student model.<n>The student model has been successfully deployed in production at Walmart.com with significantly positive metrics.
arXiv Detail & Related papers (2025-05-11T20:00:00Z)
TinyLLM: A Framework for Training and Deploying Language Models at the Edge Computers [0.8499685241219366]
Language models have gained significant interest due to their general-purpose capabilities, which appear to emerge as models are scaled to increasingly larger parameter sizes. Large models impose stringent requirements on computing systems, necessitating significant memory and processing requirements for inference. This makes performing inference on mobile and edge devices challenging, often requiring invocating remotely-hosted models via network calls.
arXiv Detail & Related papers (2024-12-19T12:28:27Z)
Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them. This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model. OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z)
Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images. We identify model weaknesses by testing the model using the counterfactual image dataset. We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z)
Self-Regulated Data-Free Knowledge Amalgamation for Text Classification [9.169836450935724]
We develop a lightweight student network that can learn from multiple teacher models without accessing their original training data. To accomplish this, we propose STRATANET, a modeling framework that produces text data tailored to each teacher. We evaluate our method on three benchmark text classification datasets with varying labels or domains.
arXiv Detail & Related papers (2024-06-16T21:13:30Z)
A Simple and Efficient Baseline for Data Attribution on Images [107.12337511216228]
Current state-of-the-art approaches require a large ensemble of as many as 300,000 models to accurately attribute model predictions. In this work, we focus on a minimalist baseline, utilizing the feature space of a backbone pretrained via self-supervised learning to perform data attribution. Our method is model-agnostic and scales easily to large datasets.
arXiv Detail & Related papers (2023-11-03T17:29:46Z)
EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR) We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model. We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z)
Teacher Guided Training: An Efficient Framework for Knowledge Transfer [86.6784627427194]
We propose the teacher-guided training (TGT) framework for training a high-quality compact model. TGT exploits the fact that the teacher has acquired a good representation of the underlying data domain. We find that TGT can improve accuracy on several image classification benchmarks and a range of text classification and retrieval tasks.
arXiv Detail & Related papers (2022-08-14T10:33:58Z)
Knowledge Distillation via Weighted Ensemble of Teaching Assistants [18.593268785143426]
Knowledge distillation is the process of transferring knowledge from a large model called the teacher to a smaller model called the student. When the network size gap between the teacher and student increases, the performance of the student network decreases. We have shown that using multiple teaching assistant models, the student model (the smaller model) can be further improved.
arXiv Detail & Related papers (2022-06-23T22:50:05Z)
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision [38.22842778742829]
Discriminative self-supervised learning allows training models on any random group of internet images. We train models on billions of random images without any data pre-processing or prior assumptions about what we want the model to learn. We extensively study and validate our model performance on over 50 benchmarks including fairness, to distribution shift, geographical diversity, fine grained recognition, image copy detection and many image classification datasets.
arXiv Detail & Related papers (2022-02-16T22:26:47Z)
Sparse Distillation: Speeding Up Text Classification by Using Bigger Models [49.8019791766848]
Distilling state-of-the-art transformer models into lightweight student models is an effective way to reduce computation cost at inference time. In this paper, we aim to further push the limit of inference speed by exploring a new area in the design space of the student model. Our experiments show that the student models retain 97% of the RoBERTa-Large teacher performance on a collection of six text classification tasks.
arXiv Detail & Related papers (2021-10-16T10:04:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.