SplitSR: An End-to-End Approach to Super-Resolution on Mobile Devices
- URL: http://arxiv.org/abs/2101.07996v1
- Date: Wed, 20 Jan 2021 06:47:41 GMT
- Title: SplitSR: An End-to-End Approach to Super-Resolution on Mobile Devices
- Authors: Xin Liu, Yuang Li, Josh Fromm, Yuntao Wang, Ziheng Jiang, Alex
Mariakakis, Shwetak Patel
- Abstract summary: We demonstrate state-of-the-art latency and accuracy for on-device super-resolution using a novel hybrid architecture called SplitSR.
SplitSR has a hybrid design consisting of standard convolutional blocks and lightweight residual blocks.
We deploy our model onto a smartphone in an app called ZoomSR to demonstrate the first-ever instance of on-device, deep learning-based SR.
- Score: 7.72178128781302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Super-resolution (SR) is a coveted image processing technique for mobile apps
ranging from the basic camera apps to mobile health. Existing SR algorithms
rely on deep learning models with significant memory requirements, so they have
yet to be deployed on mobile devices and instead operate in the cloud to
achieve feasible inference time. This shortcoming prevents existing SR methods
from being used in applications that require near real-time latency. In this
work, we demonstrate state-of-the-art latency and accuracy for on-device
super-resolution using a novel hybrid architecture called SplitSR and a novel
lightweight residual block called SplitSRBlock. The SplitSRBlock supports
channel-splitting, allowing the residual blocks to retain spatial information
while reducing the computation in the channel dimension. SplitSR has a hybrid
design consisting of standard convolutional blocks and lightweight residual
blocks, allowing people to tune SplitSR for their computational budget. We
evaluate our system on a low-end ARM CPU, demonstrating both higher accuracy
and up to 5 times faster inference than previous approaches. We then deploy our
model onto a smartphone in an app called ZoomSR to demonstrate the first-ever
instance of on-device, deep learning-based SR. We conducted a user study with
15 participants to have them assess the perceived quality of images that were
post-processed by SplitSR. Relative to bilinear interpolation -- the existing
standard for on-device SR -- participants showed a statistically significant
preference when looking at both images (Z=-9.270, p<0.01) and text (Z=-6.486,
p<0.01).
Related papers
- Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution.
To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR.
Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z) - Spatially-Adaptive Feature Modulation for Efficient Image
Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block.
Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z) - Learning to Super-Resolve Blurry Images with Events [62.61911224564196]
Super-Resolution from a single motion Blurred image (SRB) is a severely ill-posed problem due to the joint degradation of motion blurs and low spatial resolution.
We employ events to alleviate the burden of SRB and propose an Event-enhanced SRB (E-SRB) algorithm.
We show that the proposed eSL-Net++ outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-27T13:46:42Z) - LSR: A Light-Weight Super-Resolution Method [36.14816868964436]
LSR predicts the residual image between the interpolated low-resolution (ILR) and high-resolution (HR) images using a self-supervised framework.
It consists of three modules: 1) generation of a pool of rich and diversified representations in the neighborhood of a target pixel via unsupervised learning, 2) selecting a subset from the representation pool that is most relevant to the underlying super-resolution task automatically via supervised learning, 3) predicting the residual of the target pixel via regression.
arXiv Detail & Related papers (2023-02-27T09:02:35Z) - Compiler-Aware Neural Architecture Search for On-Mobile Real-time
Super-Resolution [48.13296296287587]
We propose a compiler-aware SR neural architecture search (NAS) framework that conducts depth search and per-layer width search with adaptive SR blocks.
We achieve real-time SR inference for implementing 720p resolution with competitive SR performance on GPU/DSP of mobile platforms.
arXiv Detail & Related papers (2022-07-25T23:59:19Z) - ShuffleMixer: An Efficient ConvNet for Image Super-Resolution [88.86376017828773]
We propose ShuffleMixer, for lightweight image super-resolution that explores large convolution and channel split-shuffle operation.
Specifically, we develop a large depth-wise convolution and two projection layers based on channel splitting and shuffling as the basic component to mix features efficiently.
Experimental results demonstrate that the proposed ShuffleMixer is about 6x smaller than the state-of-the-art methods in terms of model parameters and FLOPs.
arXiv Detail & Related papers (2022-05-30T15:26:52Z) - Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture
and Pruning Search [64.80878113422824]
We propose an automatic search framework that derives sparse super-resolution (SR) models with high image quality while satisfying the real-time inference requirement.
With the proposed framework, we are the first to achieve real-time SR inference (with only tens of milliseconds per frame) for implementing 720p resolution with competitive image quality.
arXiv Detail & Related papers (2021-08-18T06:47:31Z) - On-Device Text Image Super Resolution [0.0]
We present a novel deep neural network that reconstructs sharper character edges and thus boosts OCR confidence.
The proposed architecture not only achieves significant improvement in PSNR over bicubic upsampling but also runs with an average inference time of 11.7 ms per image.
We also achieve an OCR accuracy of 75.89% on the ICDAR 2015 TextSR dataset, where ground truth has an accuracy of 78.10%.
arXiv Detail & Related papers (2020-11-20T07:49:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.