Cycle Training with Semi-Supervised Domain Adaptation: Bridging Accuracy and Efficiency for Real-Time Mobile Scene Detection
- URL: http://arxiv.org/abs/2504.09297v1
- Date: Sat, 12 Apr 2025 17:42:45 GMT
- Title: Cycle Training with Semi-Supervised Domain Adaptation: Bridging Accuracy and Efficiency for Real-Time Mobile Scene Detection
- Authors: Huu-Phong Phan-Nguyen, Anh Dao, Tien-Huy Nguyen, Tuan Quang, Huu-Loc Tran, Tinh-Anh Nguyen-Nhu, Huy-Thach Pham, Quan Nguyen, Hoang M. Le, Quang-Vinh Dinh,
- Abstract summary: We propose a novel training framework called Cycle Training, which adopts a three-stage training process that alternates between exploration and stabilization phases to optimize model performance.<n> Comprehensive experiments on the CamSSD dataset for mobile scene detection demonstrate that our framework not only significantly improves classification accuracy but also ensures real-time inference efficiency.
- Score: 3.5291730624600848
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nowadays, smartphones are ubiquitous, and almost everyone owns one. At the same time, the rapid development of AI has spurred extensive research on applying deep learning techniques to image classification. However, due to the limited resources available on mobile devices, significant challenges remain in balancing accuracy with computational efficiency. In this paper, we propose a novel training framework called Cycle Training, which adopts a three-stage training process that alternates between exploration and stabilization phases to optimize model performance. Additionally, we incorporate Semi-Supervised Domain Adaptation (SSDA) to leverage the power of large models and unlabeled data, thereby effectively expanding the training dataset. Comprehensive experiments on the CamSSD dataset for mobile scene detection demonstrate that our framework not only significantly improves classification accuracy but also ensures real-time inference efficiency. Specifically, our method achieves a 94.00% in Top-1 accuracy and a 99.17% in Top-3 accuracy and runs inference in just 1.61ms using CPU, demonstrating its suitability for real-world mobile deployment.
Related papers
- Underlying Semantic Diffusion for Effective and Efficient In-Context Learning [113.4003355229632]
Underlying Semantic Diffusion (US-Diffusion) is an enhanced diffusion model that boosts underlying semantics learning, computational efficiency, and in-context learning capabilities.
We present a Feedback-Aided Learning (FAL) framework, which leverages feedback signals to guide the model in capturing semantic details.
We also propose a plug-and-play Efficient Sampling Strategy (ESS) for dense sampling at time steps with high-noise levels.
arXiv Detail & Related papers (2025-03-06T03:06:22Z) - What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR [4.606106768645647]
3D LiDAR point cloud data is crucial for scene perception in computer vision, robotics, and autonomous driving.
We present DurLAR, the first high-fidelity 128-channel 3D LiDAR dataset featuring panoramic ambient (near infrared) and reflectivity imagery.
To improve the segmentation accuracy, we introduce Range-Aware Pointwise Distance Distribution (RAPiD) features and the associated RAPiD-Seg architecture.
arXiv Detail & Related papers (2024-11-01T14:01:54Z) - Benchmarking Adaptive Intelligence and Computer Vision on Human-Robot Collaboration [0.0]
Human-Robot Collaboration (HRC) is vital in Industry 4.0, using sensors, digital twins, collaborative robots (cobots) and intention-recognition models to have efficient manufacturing processes.
We address concept drift by integrating Adaptive Intelligence and self-labeling to improve the resilience of intention-recognition in an HRC system.
arXiv Detail & Related papers (2024-09-30T01:25:48Z) - REP: Resource-Efficient Prompting for Rehearsal-Free Continual Learning [23.92661395403251]
Recent rehearsal-free methods, guided by prompts, excel in vision-related continual learning (CL) with drifting data but lack resource efficiency.<n>We introduce Resource-Efficient Prompting (REP), which improves the computational and memory efficiency of prompt-based rehearsal-free methods.<n>Our approach employs swift prompt selection to refine input data using a carefully provisioned model.
arXiv Detail & Related papers (2024-06-07T09:17:33Z) - Adaptive Retention & Correction: Test-Time Training for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.<n>We name our approach Adaptive Retention & Correction (ARC)<n>ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training [17.158498267947877]
We introduce MobileCLIP, a new family of efficient image-text models optimized for runtime performance.
MobileCLIP uses knowledge transfer from an image captioning model and an ensemble of strong CLIP encoders to improve the accuracy of efficient models.
Our approach avoids train-time compute overhead by storing the additional knowledge in a reinforced dataset.
arXiv Detail & Related papers (2023-11-28T18:55:42Z) - Getting More Juice Out of Your Data: Hard Pair Refinement Enhances Visual-Language Models Without Extra Data [122.282521548393]
Contrastive Language-Image Pre-training (CLIP) has become the standard for cross-modal image-text representation learning.<n>We introduce HELIP, a cost-effective strategy that improves CLIP models by exploiting challenging text-image pairs within existing datasets in continuous training.
arXiv Detail & Related papers (2023-05-09T07:00:17Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the
Edge [72.16021611888165]
This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices.
The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S)
Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks.
arXiv Detail & Related papers (2021-10-26T21:15:17Z) - Continual Learning at the Edge: Real-Time Training on Smartphone Devices [11.250227901473952]
This paper describes the implementation and deployment of a hybrid learning strategy (AR1*) on a native Android application for real-time on-device personalization without forgetting.
Our benchmark, based on an extension of the CORe50 dataset, shows the efficiency and effectiveness of our solution.
arXiv Detail & Related papers (2021-05-24T12:00:31Z) - Towards Efficient Scheduling of Federated Mobile Devices under
Computational and Statistical Heterogeneity [16.069182241512266]
This paper studies the implementation of distributed learning on mobile devices.
We use data as a tuning knob and propose two efficient-time algorithms to schedule different workloads.
Compared with the common benchmarks, the proposed algorithms achieve 2-100x speedup-wise, 2-7% accuracy gain and convergence rate by more than 100% on CIFAR10.
arXiv Detail & Related papers (2020-05-25T18:21:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.