Multi Self-supervised Pre-fine-tuned Transformer Fusion for Better
Intelligent Transportation Detection
- URL: http://arxiv.org/abs/2310.11307v1
- Date: Tue, 17 Oct 2023 14:32:49 GMT
- Title: Multi Self-supervised Pre-fine-tuned Transformer Fusion for Better
Intelligent Transportation Detection
- Authors: Juwu Zheng and Jiangtao Ren
- Abstract summary: Intelligent transportation system combines advanced information technology to provide intelligent services such as monitoring, detection, and early warning for modern transportation.
Existing detection methods in intelligent transportation are limited by two aspects.
First, there is a difference between the model knowledge pre-trained on large-scale datasets and the knowledge required for target task.
Second, most detection models follow the pattern of single-source learning, which limits the learning ability.
- Score: 0.32634122554914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intelligent transportation system combines advanced information technology to
provide intelligent services such as monitoring, detection, and early warning
for modern transportation. Intelligent transportation detection is the
cornerstone of many intelligent traffic services by identifying task targets
through object detection methods. However existing detection methods in
intelligent transportation are limited by two aspects. First, there is a
difference between the model knowledge pre-trained on large-scale datasets and
the knowledge required for target task. Second, most detection models follow
the pattern of single-source learning, which limits the learning ability. To
address these problems, we propose a Multi Self-supervised Pre-fine-tuned
Transformer Fusion (MSPTF) network, consisting of two steps: unsupervised
pre-fine-tune domain knowledge learning and multi-model fusion target task
learning. In the first step, we introduced self-supervised learning methods
into transformer model pre-fine-tune which could reduce data costs and
alleviate the knowledge gap between pre-trained model and target task. In the
second step, we take feature information differences between different model
architectures and different pre-fine-tune tasks into account and propose
Multi-model Semantic Consistency Cross-attention Fusion (MSCCF) network to
combine different transformer model features by considering channel semantic
consistency and feature vector semantic consistency, which obtain more complete
and proper fusion features for detection task. We experimented the proposed
method on vehicle recognition dataset and road disease detection dataset and
achieved 1.1%, 5.5%, 4.2% improvement compared with baseline and 0.7%, 1.8%,
1.7% compared with sota, which proved the effectiveness of our method.
Related papers
- SeaDATE: Remedy Dual-Attention Transformer with Semantic Alignment via Contrast Learning for Multimodal Object Detection [18.090706979440334]
Multimodal object detection leverages diverse modal information to enhance the accuracy and robustness of detectors.
Current methods merely stack Transformer-guided fusion techniques without exploring their capability to extract features at various depth layers of network.
In this paper, we introduce an accurate and efficient object detection method named SeaDATE.
arXiv Detail & Related papers (2024-10-15T07:26:39Z) - Advancing Automated Deception Detection: A Multimodal Approach to Feature Extraction and Analysis [0.0]
This research focuses on the extraction and combination of various features to enhance the accuracy of deception detection models.
By systematically extracting features from visual, audio, and text data, and experimenting with different combinations, we developed a robust model that achieved an impressive 99% accuracy.
arXiv Detail & Related papers (2024-07-08T14:59:10Z) - Remembering Transformer for Continual Learning [9.879896956915598]
We propose Remembering Transformer, inspired by the brain's Complementary Learning Systems.
Remembering Transformer employs a mixture-of-adapters architecture and a generative model-based novelty detection mechanism.
We conducted extensive experiments, including ablation studies on the novelty detection mechanism and model capacity of the mixture-of-adapters.
arXiv Detail & Related papers (2024-04-11T07:22:14Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - Exploring Highly Quantised Neural Networks for Intrusion Detection in
Automotive CAN [13.581341206178525]
Machine learning-based intrusion detection models have been shown to successfully detect multiple targeted attack vectors.
In this paper, we present a case for custom-quantised literature (CQMLP) as a multi-class classification model.
We show that the 2-bit CQMLP model, when integrated as the IDS, can detect malicious attack messages with a very high accuracy of 99.9%.
arXiv Detail & Related papers (2024-01-19T21:11:02Z) - An Empirical Study of Multimodal Model Merging [148.48412442848795]
Model merging is a technique that fuses multiple models trained on different tasks to generate a multi-task solution.
We conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture.
We propose two metrics that assess the distance between weights to be merged and can serve as an indicator of the merging outcomes.
arXiv Detail & Related papers (2023-04-28T15:43:21Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Meta-learning One-class Classifiers with Eigenvalue Solvers for
Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection.
We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z) - SpotPatch: Parameter-Efficient Transfer Learning for Mobile Object
Detection [39.29286021100541]
Deep learning based object detectors are commonly deployed on mobile devices to solve a variety of tasks.
For maximum accuracy, each detector is usually trained to solve one single task, and comes with a completely independent set of parameters.
This paper addresses the question: can task-specific detectors be trained and represented as a shared set of weights, plus a very small set of additional weights for each task?
arXiv Detail & Related papers (2021-01-04T22:24:06Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - FairMOT: On the Fairness of Detection and Re-Identification in Multiple
Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision.
We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet.
The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.