Related papers: LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion

LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion

URL: http://arxiv.org/abs/2410.03686v2
Date: Tue, 8 Oct 2024 15:11:49 GMT
Title: LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion
Authors: Meenakshi Subhash Chippa, Prakash Chandra Chhipa, Kanjar De, Marcus Liwicki, Rajkumar Saini,
Abstract summary: We show that Log Conformal Maps (LCM) approximates perspective distortion with fewer parameters and reduced computational complexity. LCM integrates well with supervised and self-supervised representation learning, outperform standard models, and matches the state-of-the-art performance in mitigating perspective distortion.
Score: 6.486569431242123
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Perspective distortion (PD) leads to substantial alterations in the shape, size, orientation, angles, and spatial relationships of visual elements in images. Accurately determining camera intrinsic and extrinsic parameters is challenging, making it hard to synthesize perspective distortion effectively. The current distortion correction methods involve removing distortion and learning vision tasks, thus making it a multi-step process, often compromising performance. Recent work leverages the M\"obius transform for mitigating perspective distortions (MPD) to synthesize perspective distortions without estimating camera parameters. M\"obius transform requires tuning multiple interdependent and interrelated parameters and involving complex arithmetic operations, leading to substantial computational complexity. To address these challenges, we propose Log Conformal Maps (LCM), a method leveraging the logarithmic function to approximate perspective distortions with fewer parameters and reduced computational complexity. We provide a detailed foundation complemented with experiments to demonstrate that LCM with fewer parameters approximates the MPD. We show that LCM integrates well with supervised and self-supervised representation learning, outperform standard models, and matches the state-of-the-art performance in mitigating perspective distortion over multiple benchmarks, namely Imagenet-PD, Imagenet-E, and Imagenet-X. Further LCM demonstrate seamless integration with person re-identification and improved the performance. Source code is made publicly available at https://github.com/meenakshi23/Log-Conformal-Maps.

Related papers

Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization [27.97760974010369]
We show an approach to reduce the effect of compression on a task loss using the distance between features as a distortion metric. We simplify the RDO formulation to make the distortion term computable using block-based encoders. We show up to 10% bit-rate savings for the same computer vision accuracy compared to RDO based on SSE.
arXiv Detail & Related papers (2025-04-03T02:11:26Z)
Scalable Visual State Space Model with Fractal Scanning [16.077348474371547]
State Space Models (SSMs) have emerged as efficient alternatives to Transformer models. We propose using fractal scanning curves for patch serialization. We validate our method in image classification, detection, and segmentation tasks.
arXiv Detail & Related papers (2024-05-23T12:12:11Z)
Möbius Transform for Mitigating Perspective Distortions in Representation Learning [43.86985901138407]
Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships in images. We propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of M"obius transform. We present a dedicated perspectively distorted benchmark dataset, ImageNet-PD, to benchmark the robustness of deep learning models against this new dataset.
arXiv Detail & Related papers (2024-03-07T15:39:00Z)
Corner-to-Center Long-range Context Model for Efficient Learned Image Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions. In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z)
Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block. Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z)
Optimizing Vision Transformers for Medical Image Segmentation and Few-Shot Domain Adaptation [11.690799827071606]
We propose Convolutional Swin-Unet (CS-Unet) transformer blocks and optimise their settings with relation to patch embedding, projection, the feed-forward network, up sampling and skip connections. CS-Unet can be trained from scratch and inherits the superiority of convolutions in each feature process phase. Experiments show that CS-Unet without pre-training surpasses other state-of-the-art counterparts by large margins on two medical CT and MRI datasets with fewer parameters.
arXiv Detail & Related papers (2022-10-14T19:18:52Z)
DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency. The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on. Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z)
PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result. Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z)
Image Deformation Estimation via Multi-Objective Optimization [13.159751065619544]
Free-form deformation model can represent a wide range of non-rigid deformations by manipulating a control point lattice over the image. It is challenging to fit the model directly to the deformed image for deformation estimation because of the complexity of the fitness landscape.
arXiv Detail & Related papers (2021-06-08T06:52:12Z)
LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution Homography Estimation [52.63874513999119]
Cross-resolution image alignment is a key problem in multiscale giga photography. Existing deep homography methods neglecting the explicit formulation of correspondences between them, which leads to degraded accuracy in cross-resolution challenges. We propose a local transformer network embedded within a multiscale structure to explicitly learn correspondences between the multimodal inputs.
arXiv Detail & Related papers (2021-06-08T02:51:45Z)
SIR: Self-supervised Image Rectification via Seeing the Same Scene from Multiple Different Lenses [82.56853587380168]
We propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of the same scene from different lens should be the same. We leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters. Our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T08:23:25Z)
ProAlignNet : Unsupervised Learning for Progressively Aligning Noisy Contours [12.791313859673187]
"ProAlignNet" accounts for large scale misalignments and complex transformations between the contour shapes. It learns by training with a novel loss function which is derived an upperbound of a proximity-sensitive and local shape-dependent similarity metric. In two real-world applications, the proposed models consistently perform superior to state-of-the-art methods.
arXiv Detail & Related papers (2020-05-23T14:56:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.