Scene Text Detection and Recognition "in light of" Challenging Environmental Conditions using Aria Glasses Egocentric Vision Cameras
- URL: http://arxiv.org/abs/2507.16330v1
- Date: Tue, 22 Jul 2025 08:12:00 GMT
- Title: Scene Text Detection and Recognition "in light of" Challenging Environmental Conditions using Aria Glasses Egocentric Vision Cameras
- Authors: Joseph De Mathia, Carlos Francisco Moreno-GarcĂa,
- Abstract summary: Scene Text Detection and Recognition (STDR) becomes a straightforward choice through the lens of egocentric vision.<n>This paper investigates how environmental variables, such as lighting, distance, and resolution, affect the performance of STDR algorithms in real-world scenarios.
- Score: 0.7366405857677226
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In an era where wearable technology is reshaping applications, Scene Text Detection and Recognition (STDR) becomes a straightforward choice through the lens of egocentric vision. Leveraging Meta's Project Aria smart glasses, this paper investigates how environmental variables, such as lighting, distance, and resolution, affect the performance of state-of-the-art STDR algorithms in real-world scenarios. We introduce a novel, custom-built dataset captured under controlled conditions and evaluate two OCR pipelines: EAST with CRNN, and EAST with PyTesseract. Our findings reveal that resolution and distance significantly influence recognition accuracy, while lighting plays a less predictable role. Notably, image upscaling emerged as a key pre-processing technique, reducing Character Error Rate (CER) from 0.65 to 0.48. We further demonstrate the potential of integrating eye-gaze tracking to optimise processing efficiency by focusing on user attention zones. This work not only benchmarks STDR performance under realistic conditions but also lays the groundwork for adaptive, user-aware AR systems. Our contributions aim to inspire future research in robust, context-sensitive text recognition for assistive and research-oriented applications, such as asset inspection and nutrition analysis. The code is available at https://github.com/josepDe/Project_Aria_STR.
Related papers
- Point Cloud Recombination: Systematic Real Data Augmentation Using Robotic Targets for LiDAR Perception Validation [0.0]
Virtual simulations allow the generation of arbitrary scenes under controlled conditions but lack physical sensor characteristics.<n>Real-world data offers true sensor realism but provides less control over influencing factors.<n>Existing approaches address this problem with augmentation of real-world point cloud data by transferring objects between scenes.<n>We propose Point Cloud Recombination, which systematically augments captured point cloud scenes by integrating point clouds acquired from physical target objects measured in controlled laboratory environments.
arXiv Detail & Related papers (2025-05-05T09:00:16Z) - Evaluation of Environmental Conditions on Object Detection using Oriented Bounding Boxes for AR Applications [7.022872089444935]
Scene analysis and object recognition play a crucial role in augmented reality (AR)
New approach is proposed that involves using oriented bounding boxes with a detection and recognition deep network to improve performance and processing time.
Results indicate that the proposed approach tends to produce better Average Precision and greater accuracy for small objects in most of the tested conditions.
arXiv Detail & Related papers (2023-06-29T09:17:58Z) - AVOIDDS: Aircraft Vision-based Intruder Detection Dataset and Simulator [37.579437595742995]
We introduce AVOIDDS, a realistic object detection benchmark for the vision-based aircraft detect-and-avoid problem.
We provide a labeled dataset consisting of 72,000 photorealistic images of intruder aircraft with various lighting conditions.
We also provide an interface that evaluates trained models on slices of this dataset to identify changes in performance with respect to changing environmental conditions.
arXiv Detail & Related papers (2023-06-19T23:58:07Z) - Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment.
In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation.
We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z) - Meta-UDA: Unsupervised Domain Adaptive Thermal Object Detection using
Meta-Learning [64.92447072894055]
Infrared (IR) cameras are robust under adverse illumination and lighting conditions.
We propose an algorithm meta-learning framework to improve existing UDA methods.
We produce a state-of-the-art thermal detector for the KAIST and DSIAC datasets.
arXiv Detail & Related papers (2021-10-07T02:28:18Z) - Cycle and Semantic Consistent Adversarial Domain Adaptation for Reducing
Simulation-to-Real Domain Shift in LiDAR Bird's Eye View [110.83289076967895]
We present a BEV domain adaptation method based on CycleGAN that uses prior semantic classification in order to preserve the information of small objects of interest during the domain adaptation process.
The quality of the generated BEVs has been evaluated using a state-of-the-art 3D object detection framework at KITTI 3D Object Detection Benchmark.
arXiv Detail & Related papers (2021-04-22T12:47:37Z) - Dense Label Encoding for Boundary Discontinuity Free Rotation Detection [69.75559390700887]
This paper explores a relatively less-studied methodology based on classification.
We propose new techniques to push its frontier in two aspects.
Experiments and visual analysis on large-scale public datasets for aerial images show the effectiveness of our approach.
arXiv Detail & Related papers (2020-11-19T05:42:02Z) - Object-based Illumination Estimation with Rendering-aware Neural
Networks [56.01734918693844]
We present a scheme for fast environment light estimation from the RGBD appearance of individual objects and their local image areas.
With the estimated lighting, virtual objects can be rendered in AR scenarios with shading that is consistent to the real scene.
arXiv Detail & Related papers (2020-08-06T08:23:19Z) - Towards End-to-end Video-based Eye-Tracking [50.0630362419371]
Estimating eye-gaze from images alone is a challenging task due to un-observable person-specific factors.
We propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships.
We demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures.
arXiv Detail & Related papers (2020-07-26T12:39:15Z) - Image Processing Based Scene-Text Detection and Recognition with
Tesseract [0.0]
This project focuses on word detection and recognition in natural images.
The project achieved a correct character recognition rate of more than 80%.
This paper outlines the stages of development, the major challenges and some of the interesting findings of the project.
arXiv Detail & Related papers (2020-04-17T06:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.