Adaptive Camera Sensor for Vision Models
- URL: http://arxiv.org/abs/2503.02170v1
- Date: Tue, 04 Mar 2025 01:20:23 GMT
- Title: Adaptive Camera Sensor for Vision Models
- Authors: Eunsu Baek, Sunghwan Han, Taesik Gong, Hyung-Sin Kim,
- Abstract summary: Lens is a novel camera sensor control method that enhances model performance by capturing high-quality images from the model's perspective.<n>At its core, Lens utilizes VisiT, a training-free, model-specific quality indicator that evaluates individual unlabeled samples at test time.<n>To validate Lens, we introduce ImageNet-ES Diverse, a new benchmark dataset capturing natural perturbations from varying sensor and lighting conditions.
- Score: 4.566795168995489
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain shift remains a persistent challenge in deep-learning-based computer vision, often requiring extensive model modifications or large labeled datasets to address. Inspired by human visual perception, which adjusts input quality through corrective lenses rather than over-training the brain, we propose Lens, a novel camera sensor control method that enhances model performance by capturing high-quality images from the model's perspective rather than relying on traditional human-centric sensor control. Lens is lightweight and adapts sensor parameters to specific models and scenes in real-time. At its core, Lens utilizes VisiT, a training-free, model-specific quality indicator that evaluates individual unlabeled samples at test time using confidence scores without additional adaptation costs. To validate Lens, we introduce ImageNet-ES Diverse, a new benchmark dataset capturing natural perturbations from varying sensor and lighting conditions. Extensive experiments on both ImageNet-ES and our new ImageNet-ES Diverse show that Lens significantly improves model accuracy across various baseline schemes for sensor control and model modification while maintaining low latency in image captures. Lens effectively compensates for large model size differences and integrates synergistically with model improvement techniques. Our code and dataset are available at github.com/Edw2n/Lens.git.
Related papers
- Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models [58.98742597810023]
Vision models have to behave in a robust way to disturbances such as noise or blur.
This paper studies two datasets of blur corruptions, which we denote OpticsBench and LensCorruptions.
Evaluations for image classification and object detection on ImageNet and MSCOCO show that for a variety of different pre-trained models, the performance on OpticsBench and LensCorruptions varies significantly.
arXiv Detail & Related papers (2025-04-25T17:23:47Z) - Deep-BrownConrady: Prediction of Camera Calibration and Distortion Parameters Using Deep Learning and Synthetic Data [11.540349678846937]
This research addresses the challenge of camera calibration and distortion parameter prediction from a single image.<n>A deep learning model, trained on a mix of real and synthetic images, can accurately predict camera and lens parameters from a single image.
arXiv Detail & Related papers (2025-01-24T14:12:04Z) - Explorations in Self-Supervised Learning: Dataset Composition Testing for Object Classification [0.0]
We investigate the impact of sampling and pretraining using datasets with different image characteristics on the performance of self-supervised learning (SSL) models for object classification.<n>We find that depth pretrained models are more effective on low resolution images, while RGB pretrained models perform better on higher resolution images.
arXiv Detail & Related papers (2024-12-01T11:21:01Z) - MSSIDD: A Benchmark for Multi-Sensor Denoising [55.41612200877861]
We introduce a new benchmark, the Multi-Sensor SIDD dataset, which is the first raw-domain dataset designed to evaluate the sensor transferability of denoising models.
We propose a sensor consistency training framework that enables denoising models to learn the sensor-invariant features.
arXiv Detail & Related papers (2024-11-18T13:32:59Z) - DarSwin-Unet: Distortion Aware Encoder-Decoder Architecture [13.412728770638465]
We present an encoder-decoder model that adapts to distortions in wide-angle lenses by leveraging the physical characteristics defined by the radial distortion profile.<n>In contrast to the original model, which only performs classification tasks, we introduce a U-Net architecture, DarSwin-Unet, designed for pixel level tasks.<n>Our approach enhances the model capability to handle pixel-level tasks in wide-angle fisheye images, making it more effective for real-world applications.
arXiv Detail & Related papers (2024-07-24T14:52:18Z) - An Ensemble Model for Distorted Images in Real Scenarios [0.0]
In this paper, we apply the object detector YOLOv7 to detect distorted images from the CDCOCO dataset.
Through carefully designed optimizations, our model achieves excellent performance on the CDCOCO test set.
Our denoising detection model can denoise and repair distorted images, making the model useful in a variety of real-world scenarios and environments.
arXiv Detail & Related papers (2023-09-26T15:12:55Z) - Neural Lens Modeling [50.57409162437732]
NeuroLens is a neural lens model for distortion and vignetting that can be used for point projection and ray casting.
It can be used to perform pre-capture calibration using classical calibration targets, and can later be used to perform calibration or refinement during 3D reconstruction.
The model generalizes across many lens types and is trivial to integrate into existing 3D reconstruction and rendering systems.
arXiv Detail & Related papers (2023-04-10T20:09:17Z) - Advancing Plain Vision Transformer Towards Remote Sensing Foundation
Model [97.9548609175831]
We resort to plain vision transformers with about 100 million parameters and make the first attempt to propose large vision models customized for remote sensing tasks.
Specifically, to handle the large image size and objects of various orientations in RS images, we propose a new rotated varied-size window attention.
Experiments on detection tasks demonstrate the superiority of our model over all state-of-the-art models, achieving 81.16% mAP on the DOTA-V1.0 dataset.
arXiv Detail & Related papers (2022-08-08T09:08:40Z) - Unrolled Primal-Dual Networks for Lensless Cameras [0.45880283710344055]
We show that learning a supervised primal-dual reconstruction method results in image quality matching state of the art in the literature.
This improvement stems from our finding that embedding learnable forward and adjoint models in a learned primal-dual optimization framework can even improve the quality of reconstructed images.
arXiv Detail & Related papers (2022-03-08T19:21:39Z) - Universal and Flexible Optical Aberration Correction Using Deep-Prior
Based Deconvolution [51.274657266928315]
We propose a PSF aware plug-and-play deep network, which takes the aberrant image and PSF map as input and produces the latent high quality version via incorporating lens-specific deep priors.
Specifically, we pre-train a base model from a set of diverse lenses and then adapt it to a given lens by quickly refining the parameters.
arXiv Detail & Related papers (2021-04-07T12:00:38Z) - SIR: Self-supervised Image Rectification via Seeing the Same Scene from
Multiple Different Lenses [82.56853587380168]
We propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of the same scene from different lens should be the same.
We leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters.
Our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T08:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.