DT-JRD: Deep Transformer based Just Recognizable Difference Prediction Model for Video Coding for Machines
- URL: http://arxiv.org/abs/2411.09308v1
- Date: Thu, 14 Nov 2024 09:34:36 GMT
- Title: DT-JRD: Deep Transformer based Just Recognizable Difference Prediction Model for Video Coding for Machines
- Authors: Junqi Liu, Yun Zhang, Xiaoqi Wang, Xu Long, Sam Kwong,
- Abstract summary: Just Recognizable Difference (JRD) represents the minimum visual difference that is detectable by machine vision.
We propose a Deep Transformer based JRD (DT-JRD) prediction model for Video Coding for Machines (VCM)
The accurately predicted JRD can be used reduce the coding bit rate while maintaining the accuracy of machine tasks.
- Score: 48.07705666485972
- License:
- Abstract: Just Recognizable Difference (JRD) represents the minimum visual difference that is detectable by machine vision, which can be exploited to promote machine vision oriented visual signal processing. In this paper, we propose a Deep Transformer based JRD (DT-JRD) prediction model for Video Coding for Machines (VCM), where the accurately predicted JRD can be used reduce the coding bit rate while maintaining the accuracy of machine tasks. Firstly, we model the JRD prediction as a multi-class classification and propose a DT-JRD prediction model that integrates an improved embedding, a content and distortion feature extraction, a multi-class classification and a novel learning strategy. Secondly, inspired by the perception property that machine vision exhibits a similar response to distortions near JRD, we propose an asymptotic JRD loss by using Gaussian Distribution-based Soft Labels (GDSL), which significantly extends the number of training labels and relaxes classification boundaries. Finally, we propose a DT-JRD based VCM to reduce the coding bits while maintaining the accuracy of object detection. Extensive experimental results demonstrate that the mean absolute error of the predicted JRD by the DT-JRD is 5.574, outperforming the state-of-the-art JRD prediction model by 13.1%. Coding experiments shows that comparing with the VVC, the DT-JRD based VCM achieves an average of 29.58% bit rate reduction while maintaining the object detection accuracy.
Related papers
- Run-time Introspection of 2D Object Detection in Automated Driving
Systems Using Learning Representations [13.529124221397822]
We introduce a novel introspection solution for 2D object detection based on Deep Neural Networks (DNNs)
We implement several state-of-the-art (SOTA) introspection mechanisms for error detection in 2D object detection, using one-stage and two-stage object detectors evaluated on KITTI and BDD datasets.
Our performance evaluation shows that the proposed introspection solution outperforms SOTA methods, achieving an absolute reduction in the missed error ratio of 9% to 17% in the BDD dataset.
arXiv Detail & Related papers (2024-03-02T10:56:14Z) - Semantic Segmentation in Satellite Hyperspectral Imagery by Deep Learning [54.094272065609815]
We propose a lightweight 1D-CNN model, 1D-Justo-LiuNet, which outperforms state-of-the-art models in the hypespectral domain.
1D-Justo-LiuNet achieves the highest accuracy (0.93) with the smallest model size (4,563 parameters) among all tested models.
arXiv Detail & Related papers (2023-10-24T21:57:59Z) - DiffusionEngine: Diffusion Model is Scalable Data Engine for Object
Detection [41.436817746749384]
Diffusion Model is a scalable data engine for object detection.
DiffusionEngine (DE) provides high-quality detection-oriented training pairs in a single stage.
arXiv Detail & Related papers (2023-09-07T17:55:01Z) - Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - Deep Metric Learning for Unsupervised Remote Sensing Change Detection [60.89777029184023]
Remote Sensing Change Detection (RS-CD) aims to detect relevant changes from Multi-Temporal Remote Sensing Images (MT-RSIs)
The performance of existing RS-CD methods is attributed to training on large annotated datasets.
This paper proposes an unsupervised CD method based on deep metric learning that can deal with both of these issues.
arXiv Detail & Related papers (2023-03-16T17:52:45Z) - DDPM-CD: Denoising Diffusion Probabilistic Models as Feature Extractors
for Change Detection [31.125812018296127]
We introduce a novel approach for change detection by pre-training a Deno Diffusionising Probabilistic Model (DDPM)
DDPM learns the training data distribution by gradually converting training images into a Gaussian distribution using a Markov chain.
During inference (i.e., sampling), they can generate a diverse set of samples closer to the training distribution.
Experiments conducted on the LEVIR-CD, WHU-CD, DSIFN-CD, and CDD datasets demonstrate that the proposed DDPM-CD method significantly outperforms the existing change detection methods in terms of F1 score, I
arXiv Detail & Related papers (2022-06-23T17:58:29Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - Transformer Transforms Salient Object Detection and Camouflaged Object
Detection [43.79585695098729]
We conduct research on applying the transformer networks for salient object detection (SOD)
Specifically, we adopt the dense transformer backbone for fully supervised RGB image based SOD, RGB-D image pair based SOD, and weakly supervised SOD via scribble supervision.
As an extension, we also apply our fully supervised model to the task of camouflaged object detection (COD) for camouflaged object segmentation.
arXiv Detail & Related papers (2021-04-20T17:12:51Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.