Lightweight high-resolution Subject Matting in the Real World
- URL: http://arxiv.org/abs/2312.07100v1
- Date: Tue, 12 Dec 2023 09:27:57 GMT
- Title: Lightweight high-resolution Subject Matting in the Real World
- Authors: Peng Liu, Fanyi Wang, Jingwen Su, Yanhao Zhang, Guojun Qi
- Abstract summary: We construct a saliency object matting dataset HRSOM and a lightweight network PSUNet.
Considering efficient inference of mobile depolyment framework, we design a symmetric pixel shuffle module and a lightweight module TRSU.
Compared to 13 SOD methods, the proposed PSUNet has the best objective performance on the high-resolution benchmark dataset.
- Score: 43.56357473163735
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing saliency object detection (SOD) methods struggle to satisfy fast
inference and accurate results simultaneously in high resolution scenes. They
are limited by the quality of public datasets and efficient network modules for
high-resolution images. To alleviate these issues, we propose to construct a
saliency object matting dataset HRSOM and a lightweight network PSUNet.
Considering efficient inference of mobile depolyment framework, we design a
symmetric pixel shuffle module and a lightweight module TRSU. Compared to 13
SOD methods, the proposed PSUNet has the best objective performance on the
high-resolution benchmark dataset. Evaluation results of objective assessment
are superior compared to U$^2$Net that has 10 times of parameter amount of our
network. On Snapdragon 8 Gen 2 Mobile Platform, inference a single
640$\times$640 image only takes 113ms. And on the subjective assessment,
evaluation results are better than the industry benchmark IOS16 (Lift subject
from background).
Related papers
- R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions? [86.94616033250068]
R-Bench is a benchmark focused on the **Real-world Robustness of LMMs**.
We show that while LMMs can correctly handle the original reference images, their performance is not stable when faced with distorted images.
We hope that R-Bench will inspire improving the robustness of LMMs, **extending them from experimental simulations to the real-world application**.
arXiv Detail & Related papers (2024-10-07T20:12:08Z) - Scale-Invariant Object Detection by Adaptive Convolution with Unified Global-Local Context [3.061662434597098]
We propose an object detection model using a Switchable (adaptive) Atrous Convolutional Network (SAC-Net) based on the efficientDet model.
The proposed SAC-Net encapsulates the benefits of both low-level and high-level features to achieve improved performance on multi-scale object detection tasks.
Our experiments on benchmark datasets demonstrate that the proposed SAC-Net outperforms the state-of-the-art models by a significant margin in terms of accuracy.
arXiv Detail & Related papers (2024-09-17T10:08:37Z) - Recurrent Multi-scale Transformer for High-Resolution Salient Object
Detection [68.65338791283298]
Salient Object Detection (SOD) aims to identify and segment the most conspicuous objects in an image or video.
Traditional SOD methods are largely limited to low-resolution images, making them difficult to adapt to the development of High-Resolution SOD.
In this work, we first propose a new HRS10K dataset, which contains 10,500 high-quality annotated images at 2K-8K resolution.
arXiv Detail & Related papers (2023-08-07T17:49:04Z) - MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection [54.52102265418295]
We propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection.
For the feature-level fusion, we present the Multi-scale Voxel Image fusion (MVI) module, which densely aligns multi-scale voxel features with image features.
For the decision-level fusion, we propose the lightweight Feature-cued Confidence Rectification (FCR) module, which exploits image semantics to rectify the confidence of detection candidates.
arXiv Detail & Related papers (2023-07-18T11:26:02Z) - Efficient Context Integration through Factorized Pyramidal Learning for
Ultra-Lightweight Semantic Segmentation [1.0499611180329804]
We propose a novel Factorized Pyramidal Learning (FPL) module to aggregate rich contextual information in an efficient manner.
We decompose the spatial pyramid into two stages which enables a simple and efficient feature fusion within the module to solve the notorious checkerboard effect.
Based on the FPL module and FIR unit, we propose an ultra-lightweight real-time network, called FPLNet, which achieves state-of-the-art accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-02-23T05:34:51Z) - Rethinking Lightweight Salient Object Detection via Network Depth-Width
Tradeoff [26.566339984225756]
Existing salient object detection methods often adopt deeper and wider networks for better performance.
We propose a novel trilateral decoder framework by decoupling the U-shape structure into three complementary branches.
We show that our method achieves better efficiency-accuracy balance across five benchmarks.
arXiv Detail & Related papers (2023-01-17T03:43:25Z) - SALISA: Saliency-based Input Sampling for Efficient Video Object
Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection.
We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - A new public Alsat-2B dataset for single-image super-resolution [1.284647943889634]
The paper introduces a novel public remote sensing dataset (Alsat2B) of low and high spatial resolution images (10m and 2.5m respectively) for the single-image super-resolution task.
The high-resolution images are obtained through pan-sharpening.
The obtained results reveal that the proposed scheme is promising and highlight the challenges in the dataset.
arXiv Detail & Related papers (2021-03-21T10:47:38Z) - MODNet: Real-Time Trimap-Free Portrait Matting via Objective
Decomposition [39.60219801564855]
Existing portrait matting methods require auxiliary inputs that are costly to obtain or involve multiple stages that are computationally expensive.
We present a light-weight matting objective decomposition network (MODNet) for portrait matting in real-time with a single input image.
arXiv Detail & Related papers (2020-11-24T08:38:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.