EgoQR: Efficient QR Code Reading in Egocentric Settings
- URL: http://arxiv.org/abs/2410.05497v1
- Date: Mon, 7 Oct 2024 21:06:59 GMT
- Title: EgoQR: Efficient QR Code Reading in Egocentric Settings
- Authors: Mohsen Moslehpour, Yichao Lu, Pierce Chuang, Ashish Shenoy, Debojeet Chatterjee, Abhay Harpale, Srihari Jayakumar, Vikas Bhardwaj, Seonghyeon Nam, Anuj Kumar,
- Abstract summary: We present EgoQR, a novel system for reading QR codes from egocentric images.
Our approach consists of two primary components: detection and decoding, designed to operate on high-resolution images on the device.
We evaluate our approach on a dataset of egocentric images, demonstrating 34% improvement in reading the code compared to an existing state of the art QR code readers.
- Score: 9.522585805664233
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: QR codes have become ubiquitous in daily life, enabling rapid information exchange. With the increasing adoption of smart wearable devices, there is a need for efficient, and friction-less QR code reading capabilities from Egocentric point-of-views. However, adapting existing phone-based QR code readers to egocentric images poses significant challenges. Code reading from egocentric images bring unique challenges such as wide field-of-view, code distortion and lack of visual feedback as compared to phones where users can adjust the position and framing. Furthermore, wearable devices impose constraints on resources like compute, power and memory. To address these challenges, we present EgoQR, a novel system for reading QR codes from egocentric images, and is well suited for deployment on wearable devices. Our approach consists of two primary components: detection and decoding, designed to operate on high-resolution images on the device with minimal power consumption and added latency. The detection component efficiently locates potential QR codes within the image, while our enhanced decoding component extracts and interprets the encoded information. We incorporate innovative techniques to handle the specific challenges of egocentric imagery, such as varying perspectives, wider field of view, and motion blur. We evaluate our approach on a dataset of egocentric images, demonstrating 34% improvement in reading the code compared to an existing state of the art QR code readers.
Related papers
- Reconstructive Visual Instruction Tuning [64.91373889600136]
reconstructive visual instruction tuning (ROSS) is a family of Large Multimodal Models (LMMs) that exploit vision-centric supervision signals.
It reconstructs latent representations of input images, avoiding directly regressing exact raw RGB values.
Empirically, ROSS consistently brings significant improvements across different visual encoders and language models.
arXiv Detail & Related papers (2024-10-12T15:54:29Z) - See then Tell: Enhancing Key Information Extraction with Vision Grounding [54.061203106565706]
We introduce STNet (See then Tell Net), a novel end-to-end model designed to deliver precise answers with relevant vision grounding.
To enhance the model's seeing capabilities, we collect extensive structured table recognition datasets.
arXiv Detail & Related papers (2024-09-29T06:21:05Z) - UNIT: Unifying Image and Text Recognition in One Vision Encoder [51.140564856352825]
UNIT is a novel training framework aimed at UNifying Image and Text recognition within a single model.
We show that UNIT significantly outperforms existing methods on document-related tasks.
Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment.
arXiv Detail & Related papers (2024-09-06T08:02:43Z) - Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for
Text-Guided QR Code Generation [38.281805719692194]
In the digital era, QR codes serve as a linchpin connecting virtual and physical realms.
prevailing methods grapple with the intrinsic challenge of balancing customization and scannability.
This paper introduces Text2QR, a pioneering approach leveraging stable-diffusion models.
arXiv Detail & Related papers (2024-03-11T06:03:31Z) - SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation [39.84456803546365]
SSR-Encoder is a novel architecture designed for selectively capturing any subject from single or multiple reference images.
It responds to various query modalities including text and masks, without necessitating test-time fine-tuning.
Characterized by its model generalizability and efficiency, the SSR-Encoder adapts to a range of custom models and control modules.
arXiv Detail & Related papers (2023-12-26T14:39:11Z) - Dual Associated Encoder for Face Restoration [68.49568459672076]
We propose a novel dual-branch framework named DAEFR to restore facial details from low-quality (LQ) images.
Our method introduces an auxiliary LQ branch that extracts crucial information from the LQ inputs.
We evaluate the effectiveness of DAEFR on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-08-14T17:58:33Z) - Collaborative Auto-encoding for Blind Image Quality Assessment [17.081262827258943]
Blind image quality assessment (BIQA) is a challenging problem with important real-world applications.
Recent efforts attempting to exploit powerful representations by deep neural networks (DNN) are hindered by the lack of subjectively annotated data.
This paper presents a novel BIQA method which overcomes this fundamental obstacle.
arXiv Detail & Related papers (2023-05-24T03:45:03Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z) - A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision [93.90545426665999]
We take a close look at autoregressive decoders for multi-task learning in multimodal computer vision.
A key finding is that a small decoder learned on top of a frozen pretrained encoder works surprisingly well.
It can be seen as teaching a decoder to interact with a pretrained vision model via natural language.
arXiv Detail & Related papers (2023-03-30T13:42:58Z) - Medical visual question answering using joint self-supervised learning [8.817054025763325]
The encoder embeds across the image-text dual modalities with self-attention mechanism.
The decoder is connected to the top of the encoder and fine-tuned using the small-sized medical VQA dataset.
arXiv Detail & Related papers (2023-02-25T12:12:22Z) - An End-to-end Method for Producing Scanning-robust Stylized QR Codes [45.35370585928748]
We propose a novel end-to-end method, named ArtCoder, to generate stylized QR codes.
The experimental results show that our stylized QR codes have high-quality in both the visual effect and the scanning-robustness.
arXiv Detail & Related papers (2020-11-16T09:38:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.