Related papers: Learning to Sense for Driving: Joint Optics-Sensor-Model Co-Design for Semantic Segmentation

Learning to Sense for Driving: Joint Optics-Sensor-Model Co-Design for Semantic Segmentation

URL: http://arxiv.org/abs/2512.20815v2
Date: Thu, 25 Dec 2025 20:26:31 GMT
Title: Learning to Sense for Driving: Joint Optics-Sensor-Model Co-Design for Semantic Segmentation
Authors: Reeshad Khan, John Gauch,
Abstract summary: Traditional autonomous driving pipelines decouple camera design from downstream perception.<n>We present a task-driven co-design framework that unifies optics, sensor modeling, and lightweight semantic segmentation networks.<n>Our system integrates realistic cellphone-scale lens models, learnable color filter arrays, Poisson-Gaussian noise processes, and quantization.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Traditional autonomous driving pipelines decouple camera design from downstream perception, relying on fixed optics and handcrafted ISPs that prioritize human viewable imagery rather than machine semantics. This separation discards information during demosaicing, denoising, or quantization, while forcing models to adapt to sensor artifacts. We present a task-driven co-design framework that unifies optics, sensor modeling, and lightweight semantic segmentation networks into a single end-to-end RAW-to-task pipeline. Building on DeepLens[19], our system integrates realistic cellphone-scale lens models, learnable color filter arrays, Poisson-Gaussian noise processes, and quantization, all optimized directly for segmentation objectives. Evaluations on KITTI-360 show consistent mIoU improvements over fixed pipelines, with optics modeling and CFA learning providing the largest gains, especially for thin or low-light-sensitive classes. Importantly, these robustness gains are achieved with a compact ~1M-parameter model running at ~28 FPS, demonstrating edge deployability. Visual and quantitative analyses further highlight how co-designed sensors adapt acquisition to semantic structure, sharpening boundaries and maintaining accuracy under blur, noise, and low bit-depth. Together, these findings establish full-stack co-optimization of optics, sensors, and networks as a principled path toward efficient, reliable, and deployable perception in autonomous systems.

Related papers

CoWTracker: Tracking by Warping instead of Correlation [53.834673070954494]
We propose a dense point tracker that eschews cost volumes in favor of warping.<n>Inspired by recent advances in optical flow, our approach iteratively refines track estimates by warping features from the target frame to the query frame based on the current estimate.<n>Our model is simple and achieves state-of-the-art performance on standard dense point tracking benchmarks, including TAP-Vid-DAVIS, TAP-Vid-Kinetics, and Robo-TAP.
arXiv Detail & Related papers (2026-02-04T18:58:59Z)
Robust Single-shot Structured Light 3D Imaging via Neural Feature Decoding [33.013553875034795]
We consider the problem of active 3D imaging using single-shot structured light systems.<n>Traditional structured light methods typically decode depth correspondences through pixel-domain matching algorithms.<n>Inspired by recent advances in neural feature matching, we propose a learning-based structured light decoding framework.
arXiv Detail & Related papers (2025-12-16T02:47:38Z)
SPORTS: Simultaneous Panoptic Odometry, Rendering, Tracking and Segmentation for Urban Scenes Understanding [0.0]
This paper proposes a novel framework, named SPORTS, for holistic scene understanding.<n>It integrates Video Panoptic (VPS), Visual Odometry (VO), and Scene Rendering tasks into an iterative and unified perspective.<n>Our attention-based feature fusion outperforms most existing state-of-the-art synthesis methods on the odometry, tracking, segmentation, and novel view tasks.
arXiv Detail & Related papers (2025-10-14T17:28:19Z)
LensNet: An End-to-End Learning Framework for Empirical Point Spread Function Modeling and Lensless Imaging Reconstruction [32.85180149439811]
Lensless imaging stands out as a promising alternative to conventional lens-based systems.<n>Traditional lensless techniques often require explicit calibrations and extensive pre-processing.<n>We propose LensNet, an end-to-end deep learning framework that integrates spatial-domain and frequency-domain representations.
arXiv Detail & Related papers (2025-05-03T09:11:52Z)
FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Multitask AET with Orthogonal Tangent Regularity for Dark Object Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment. In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation. We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z)
Learning Deep Context-Sensitive Decomposition for Low-Light Image Enhancement [58.72667941107544]
A typical framework is to simultaneously estimate the illumination and reflectance, but they disregard the scene-level contextual information encapsulated in feature spaces. We develop a new context-sensitive decomposition network architecture to exploit the scene-level contextual dependencies on spatial scales. We develop a lightweight CSDNet (named LiteCSDNet) by reducing the number of channels.
arXiv Detail & Related papers (2021-12-09T06:25:30Z)
RRNet: Relational Reasoning Network with Parallel Multi-scale Attention for Salient Object Detection in Optical Remote Sensing Images [82.1679766706423]
Salient object detection (SOD) for optical remote sensing images (RSIs) aims at locating and extracting visually distinctive objects/regions from the optical RSIs. We propose a relational reasoning network with parallel multi-scale attention for SOD in optical RSIs. Our proposed RRNet outperforms the existing state-of-the-art SOD competitors both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-10-27T07:18:32Z)
Lite-HDSeg: LiDAR Semantic Segmentation Using Lite Harmonic Dense Convolutions [2.099922236065961]
We present Lite-HDSeg, a novel real-time convolutional neural network for semantic segmentation of full $3$D LiDAR point clouds. Our experimental results show that the proposed method outperforms state-of-the-art semantic segmentation approaches which can run real-time.
arXiv Detail & Related papers (2021-03-16T04:54:57Z)
Optical Flow Estimation from a Single Motion-blurred Image [66.2061278123057]
Motion blur in an image may have practical interests in fundamental computer vision problems. We propose a novel framework to estimate optical flow from a single motion-blurred image in an end-to-end manner.
arXiv Detail & Related papers (2021-03-04T12:45:18Z)
Learning a Probabilistic Strategy for Computational Imaging Sensor Selection [16.553234762932938]
We propose a physics-constrained, fully differentiable, autoencoder that learns a probabilistic sensor-sampling strategy for optimized sensor design. The proposed method learns a system's preferred sampling distribution that characterizes the correlations between different sensor selections as a binary, fully-connected Ising model.
arXiv Detail & Related papers (2020-03-23T17:52:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.