A Co-Training Semi-Supervised Framework Using Faster R-CNN and YOLO Networks for Object Detection in Densely Packed Retail Images
- URL: http://arxiv.org/abs/2509.09750v1
- Date: Thu, 11 Sep 2025 13:40:43 GMT
- Title: A Co-Training Semi-Supervised Framework Using Faster R-CNN and YOLO Networks for Object Detection in Densely Packed Retail Images
- Authors: Hossein Yazdanjouei, Arash Mansouri, Mohammad Shokouhifar,
- Abstract summary: This study proposes a semi-supervised co-training framework for object detection in densely packed retail environments.<n>The framework combines Faster R-CNN for precise localization with YOLO for global context.<n>It employs an ensemble of XGBoost, Random Forest, and SVM, utilizing diverse feature representations for higher robustness.
- Score: 1.0896567381206714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study proposes a semi-supervised co-training framework for object detection in densely packed retail environments, where limited labeled data and complex conditions pose major challenges. The framework combines Faster R-CNN (utilizing a ResNet backbone) for precise localization with YOLO (employing a Darknet backbone) for global context, enabling mutual pseudo-label exchange that improves accuracy in scenes with occlusion and overlapping objects. To strengthen classification, it employs an ensemble of XGBoost, Random Forest, and SVM, utilizing diverse feature representations for higher robustness. Hyperparameters are optimized using a metaheuristic-driven algorithm, enhancing precision and efficiency across models. By minimizing reliance on manual labeling, the approach reduces annotation costs and adapts effectively to frequent product and layout changes common in retail. Experiments on the SKU-110k dataset demonstrate strong performance, highlighting the scalability and practicality of the proposed framework for real-world retail applications such as automated inventory tracking, product monitoring, and checkout systems.
Related papers
- Scale-aware Adaptive Supervised Network with Limited Medical Annotations [17.42211316792232]
SASNet is a dual-branch architecture that leverages both low-level and high-level feature representations through novel scale-aware adaptive reweight mechanisms.<n>Our approach introduces three key methodological innovations, including the Scale-aware Adaptive Reweight strategy.<n> SASNet achieves superior performance with limited labeled data, surpassing state-of-the-art semi-supervised methods.
arXiv Detail & Related papers (2026-01-02T23:55:17Z) - Resource-Aware Neural Network Pruning Using Graph-based Reinforcement Learning [0.8890833546984916]
This paper presents a novel approach to neural network pruning by integrating a graph-based observation space into an AutoML framework.<n>Our framework transforms the pruning process by introducing a graph representation of the target neural network.<n>For the action space we transition from continuous pruning ratios to fine-grained binary action spaces.
arXiv Detail & Related papers (2025-09-04T15:05:05Z) - Towards Efficient General Feature Prediction in Masked Skeleton Modeling [59.46799426434277]
We propose a novel General Feature Prediction framework (GFP) for efficient mask skeleton modeling.<n>Our key innovation is replacing conventional low-level reconstruction with high-level feature prediction that spans from local motion patterns to global semantic representations.
arXiv Detail & Related papers (2025-09-03T18:05:02Z) - SCoRE: Streamlined Corpus-based Relation Extraction using Multi-Label Contrastive Learning and Bayesian kNN [0.2812395851874055]
We introduce SCoRE, a modular and cost-effective sentence-level relation extraction system.<n>SCoRE enables easy PLM switching, requires no finetuning, and adapts smoothly to diverse corpora and KGs.<n>We show that SCoRE matches or surpasses state-of-the-art methods while significantly reducing energy consumption.
arXiv Detail & Related papers (2025-07-09T14:33:07Z) - MGDFIS: Multi-scale Global-detail Feature Integration Strategy for Small Object Detection [12.838872442435527]
Small object detection in UAV imagery is crucial for applications such as search-and-rescue, traffic monitoring, and environmental surveillance.<n>Existing multi-scale fusion methods help, but add computational burden and blur fine details.<n>We propose a unified fusion framework that tightly couples global context with local detail to boost detection performance.
arXiv Detail & Related papers (2025-06-15T02:54:25Z) - Distributionally Robust Federated Learning with Client Drift Minimization [35.08453461129848]
textitDRDM is a distributionally robust optimization framework with dynamic regularization to mitigate client drift.<n>textitDRDM frames the training as a min-max optimization problem aimed at maximizing performance for the worst-case client.<n>Experiments show that textitDRDM significantly improves worst-case test accuracy while requiring fewer communication rounds.
arXiv Detail & Related papers (2025-05-21T11:05:56Z) - ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions.
Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks.
We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Dynamic Regularized Sharpness Aware Minimization in Federated Learning: Approaching Global Consistency and Smooth Landscape [59.841889495864386]
In federated learning (FL), a cluster of local clients are chaired under the coordination of a global server.
Clients are prone to overfit into their own optima, which extremely deviates from the global objective.
ttfamily FedSMOO adopts a dynamic regularizer to guarantee the local optima towards the global objective.
Our theoretical analysis indicates that ttfamily FedSMOO achieves fast $mathcalO (1/T)$ convergence rate with low bound generalization.
arXiv Detail & Related papers (2023-05-19T10:47:44Z) - Adaptive Sparse Convolutional Networks with Global Context Enhancement
for Faster Object Detection on Drone Images [26.51970603200391]
This paper investigates optimizing the detection head based on the sparse convolution.
It suffers from inadequate integration of contextual information of tiny objects.
We propose a novel global context-enhanced adaptive sparse convolutional network.
arXiv Detail & Related papers (2023-03-25T14:42:50Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.