LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion
- URL: http://arxiv.org/abs/2410.03686v2
- Date: Tue, 8 Oct 2024 15:11:49 GMT
- Title: LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion
- Authors: Meenakshi Subhash Chippa, Prakash Chandra Chhipa, Kanjar De, Marcus Liwicki, Rajkumar Saini,
- Abstract summary: We show that Log Conformal Maps (LCM) approximates perspective distortion with fewer parameters and reduced computational complexity.
LCM integrates well with supervised and self-supervised representation learning, outperform standard models, and matches the state-of-the-art performance in mitigating perspective distortion.
- Score: 6.486569431242123
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Perspective distortion (PD) leads to substantial alterations in the shape, size, orientation, angles, and spatial relationships of visual elements in images. Accurately determining camera intrinsic and extrinsic parameters is challenging, making it hard to synthesize perspective distortion effectively. The current distortion correction methods involve removing distortion and learning vision tasks, thus making it a multi-step process, often compromising performance. Recent work leverages the M\"obius transform for mitigating perspective distortions (MPD) to synthesize perspective distortions without estimating camera parameters. M\"obius transform requires tuning multiple interdependent and interrelated parameters and involving complex arithmetic operations, leading to substantial computational complexity. To address these challenges, we propose Log Conformal Maps (LCM), a method leveraging the logarithmic function to approximate perspective distortions with fewer parameters and reduced computational complexity. We provide a detailed foundation complemented with experiments to demonstrate that LCM with fewer parameters approximates the MPD. We show that LCM integrates well with supervised and self-supervised representation learning, outperform standard models, and matches the state-of-the-art performance in mitigating perspective distortion over multiple benchmarks, namely Imagenet-PD, Imagenet-E, and Imagenet-X. Further LCM demonstrate seamless integration with person re-identification and improved the performance. Source code is made publicly available at https://github.com/meenakshi23/Log-Conformal-Maps.
Related papers
- Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts [7.136884388888679]
We introduce the Unified Rectification Framework (UniRect), a comprehensive approach that addresses these practical tasks from a consistent distortion rectification perspective.<n>Our approach incorporates various task-specific inverse problems into a general distortion model by simulating different types of lenses.<n>Our models have achieved state-of-the-art performance compared with other up-to-date methods.
arXiv Detail & Related papers (2025-12-21T12:33:44Z) - Unsupervised Multi-View Visual Anomaly Detection via Progressive Homography-Guided Alignment [14.782512101141016]
Unsupervised visual anomaly detection from multi-view images presents a significant challenge.<n>ViewSense-AD (VSAD) learns viewpoint-invariant representations by explicitly modeling geometric consistency across views.<n>Anomaly detection is performed by comparing multi-level features from the diffusion model against a learned memory bank of normal prototypes.
arXiv Detail & Related papers (2025-11-24T05:01:16Z) - Visual Odometry with Transformers [68.453547770334]
We introduce Visual odometry Transformer (VoT), which processes sequences of monocular frames by extracting features.<n>Unlike prior methods, VoT directly predicts camera motion without estimating dense geometry and relies solely on camera poses for supervision.<n>VoT scales effectively with larger datasets, benefits substantially from stronger pre-trained backbones, generalizes across diverse camera motions and calibration settings, and outperforms traditional methods while running more than 3 times faster.
arXiv Detail & Related papers (2025-10-02T17:00:14Z) - RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization [50.75654397516163]
We propose RelayFormer, a unified framework that adapts to varying resolutions and modalities.<n> RelayFormer partitions inputs into fixed-size sub-images and introduces Global-Local Relay (GLR) tokens.<n>This enables efficient exchange of global cues, such as semantic or temporal consistency, while preserving fine-grained manipulation artifacts.
arXiv Detail & Related papers (2025-08-13T03:35:28Z) - Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization [27.97760974010369]
We show an approach to reduce the effect of compression on a task loss using the distance between features as a distortion metric.
We simplify the RDO formulation to make the distortion term computable using block-based encoders.
We show up to 10% bit-rate savings for the same computer vision accuracy compared to RDO based on SSE.
arXiv Detail & Related papers (2025-04-03T02:11:26Z) - Scalable Visual State Space Model with Fractal Scanning [16.077348474371547]
State Space Models (SSMs) have emerged as efficient alternatives to Transformer models.
We propose using fractal scanning curves for patch serialization.
We validate our method in image classification, detection, and segmentation tasks.
arXiv Detail & Related papers (2024-05-23T12:12:11Z) - Möbius Transform for Mitigating Perspective Distortions in Representation Learning [43.86985901138407]
Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships in images.
We propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of M"obius transform.
We present a dedicated perspectively distorted benchmark dataset, ImageNet-PD, to benchmark the robustness of deep learning models against this new dataset.
arXiv Detail & Related papers (2024-03-07T15:39:00Z) - Corner-to-Center Long-range Context Model for Efficient Learned Image
Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations.
We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions.
In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z) - Spatially-Adaptive Feature Modulation for Efficient Image
Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block.
Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z) - Optimizing Vision Transformers for Medical Image Segmentation and
Few-Shot Domain Adaptation [11.690799827071606]
We propose Convolutional Swin-Unet (CS-Unet) transformer blocks and optimise their settings with relation to patch embedding, projection, the feed-forward network, up sampling and skip connections.
CS-Unet can be trained from scratch and inherits the superiority of convolutions in each feature process phase.
Experiments show that CS-Unet without pre-training surpasses other state-of-the-art counterparts by large margins on two medical CT and MRI datasets with fewer parameters.
arXiv Detail & Related papers (2022-10-14T19:18:52Z) - DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency.
The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on.
Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - Image Deformation Estimation via Multi-Objective Optimization [13.159751065619544]
Free-form deformation model can represent a wide range of non-rigid deformations by manipulating a control point lattice over the image.
It is challenging to fit the model directly to the deformed image for deformation estimation because of the complexity of the fitness landscape.
arXiv Detail & Related papers (2021-06-08T06:52:12Z) - LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution
Homography Estimation [52.63874513999119]
Cross-resolution image alignment is a key problem in multiscale giga photography.
Existing deep homography methods neglecting the explicit formulation of correspondences between them, which leads to degraded accuracy in cross-resolution challenges.
We propose a local transformer network embedded within a multiscale structure to explicitly learn correspondences between the multimodal inputs.
arXiv Detail & Related papers (2021-06-08T02:51:45Z) - SIR: Self-supervised Image Rectification via Seeing the Same Scene from
Multiple Different Lenses [82.56853587380168]
We propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of the same scene from different lens should be the same.
We leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters.
Our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T08:23:25Z) - ProAlignNet : Unsupervised Learning for Progressively Aligning Noisy
Contours [12.791313859673187]
"ProAlignNet" accounts for large scale misalignments and complex transformations between the contour shapes.
It learns by training with a novel loss function which is derived an upperbound of a proximity-sensitive and local shape-dependent similarity metric.
In two real-world applications, the proposed models consistently perform superior to state-of-the-art methods.
arXiv Detail & Related papers (2020-05-23T14:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.