REG: Refined Generalized Focal Loss for Road Asset Detection on Thai Highways Using Vision-Based Detection and Segmentation Models
- URL: http://arxiv.org/abs/2409.09877v2
- Date: Tue, 17 Sep 2024 01:30:22 GMT
- Title: REG: Refined Generalized Focal Loss for Road Asset Detection on Thai Highways Using Vision-Based Detection and Segmentation Models
- Authors: Teerapong Panboonyuen,
- Abstract summary: This paper introduces a novel framework for detecting and segmenting critical road assets on Thai highways.
Integrated into state-of-the-art vision-based detection and segmentation models, the proposed method effectively addresses class imbalance and the challenges of localizing small, underrepresented road elements.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a novel framework for detecting and segmenting critical road assets on Thai highways using an advanced Refined Generalized Focal Loss (REG) formulation. Integrated into state-of-the-art vision-based detection and segmentation models, the proposed method effectively addresses class imbalance and the challenges of localizing small, underrepresented road elements, including pavilions, pedestrian bridges, information signs, single-arm poles, bus stops, warning signs, and concrete guardrails. To improve both detection and segmentation accuracy, a multi-task learning strategy is adopted, optimizing REG across multiple tasks. REG is further enhanced by incorporating a spatial-contextual adjustment term, which accounts for the spatial distribution of road assets, and a probabilistic refinement that captures prediction uncertainty in complex environments, such as varying lighting conditions and cluttered backgrounds. Our rigorous mathematical formulation demonstrates that REG minimizes localization and classification errors by applying adaptive weighting to hard-to-detect instances while down-weighting easier examples. Experimental results show a substantial performance improvement, achieving a mAP50 of 80.34 and an F1-score of 77.87, significantly outperforming conventional methods. This research underscores the capability of advanced loss function refinements to enhance the robustness and accuracy of road asset detection and segmentation, thereby contributing to improved road safety and infrastructure management. For an in-depth discussion of the mathematical background and related methods, please refer to previous work available at \url{https://github.com/kaopanboonyuen/REG}.
Related papers
- ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - A Deeply Supervised Semantic Segmentation Method Based on GAN [9.441379867578332]
The proposed model integrates a generative adversarial network (GAN) framework into the traditional semantic segmentation model.
The effectiveness of our approach is demonstrated by a significant boost in performance on the road crack dataset.
arXiv Detail & Related papers (2023-10-06T08:22:24Z) - Leveraging Topology for Domain Adaptive Road Segmentation in Satellite
and Aerial Imagery [9.23555285827483]
Road segmentation algorithms fail to generalize to new geographical locations.
Road skeleton is an auxiliary task to impose the topological constraints.
For self-training, we filter out the noisy pseudo-labels by using a connectivity-based pseudo-labels refinement strategy.
arXiv Detail & Related papers (2023-09-27T12:50:51Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Learning Invariant Representation via Contrastive Feature Alignment for
Clutter Robust SAR Target Recognition [10.993101256393679]
This letter proposes a solution called Contrastive Feature Alignment (CFA) to learn invariant representation for robust recognition.
CFA combines both classification and CWMSE losses to train the model jointly.
The proposed CFA combines both classification and CWMSE losses to train the model jointly, which allows for the progressive learning of invariant target representation.
arXiv Detail & Related papers (2023-04-04T12:35:33Z) - Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised
Referring Expression Grounding [214.8003571700285]
Weakly supervised Referring Expression Grounding (REG) aims to ground a particular target in an image described by a language expression.
We design an entity-enhanced adaptive reconstruction network (EARN)
EARN includes three modules: entity enhancement, adaptive grounding, and collaborative reconstruction.
arXiv Detail & Related papers (2022-07-18T05:30:45Z) - SRRT: Exploring Search Region Regulation for Visual Object Tracking [58.68120400180216]
We propose a novel tracking paradigm, called Search Region Regulation Tracking (SRRT)
SRRT applies a proposed search region regulator to estimate an optimal search region dynamically for each frame.
On the large-scale LaSOT benchmark, SRRT improves SiamRPN++ and TransT with absolute gains of 4.6% and 3.1% in terms of AUC.
arXiv Detail & Related papers (2022-07-10T11:18:26Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal
Clustering and Large-Scale Heterogeneous Environment Synthesis [76.46004354572956]
We introduce an unsupervised domain adaptation approach for person re-identification.
Experimental results show that the proposed ktCUDA and SHRED approach achieves an average improvement of +5.7 mAP in re-identification performance.
arXiv Detail & Related papers (2020-01-14T17:43:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.