Features extraction for image identification using computer vision
- URL: http://arxiv.org/abs/2507.18650v1
- Date: Tue, 22 Jul 2025 10:43:52 GMT
- Title: Features extraction for image identification using computer vision
- Authors: Venant Niyonkuru, Sylla Sekou, Jimmy Jackson Sinzinkayo,
- Abstract summary: The study focuses on Vision Transformers (ViTs) and other approaches such as Generative Adversarial Networks (GANs)<n> Experimental results determine the merits and limitations of both methods and their utilitarian applications in advancing computer vision.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This study examines various feature extraction techniques in computer vision, the primary focus of which is on Vision Transformers (ViTs) and other approaches such as Generative Adversarial Networks (GANs), deep feature models, traditional approaches (SIFT, SURF, ORB), and non-contrastive and contrastive feature models. Emphasizing ViTs, the report summarizes their architecture, including patch embedding, positional encoding, and multi-head self-attention mechanisms with which they overperform conventional convolutional neural networks (CNNs). Experimental results determine the merits and limitations of both methods and their utilitarian applications in advancing computer vision.
Related papers
- Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques [91.26187560114381]
Feature matching is a cornerstone task in computer vision, essential for applications such as image retrieval, stereo matching, 3D reconstruction, and SLAM.<n>This survey comprehensively reviews modality-based feature matching, exploring traditional handcrafted methods and contemporary deep learning approaches.
arXiv Detail & Related papers (2025-07-30T15:56:36Z) - Convolution goes higher-order: a biologically inspired mechanism empowers image classification [0.8999666725996975]
We propose a novel approach to image classification inspired by complex nonlinear biological visual processing.<n>Our model incorporates a Volterra-like expansion of the convolution operator, capturing multiplicative interactions.<n>Our work bridges neuroscience and deep learning, offering a path towards more effective, biologically inspired computer vision models.
arXiv Detail & Related papers (2024-12-09T18:33:09Z) - Inverting Transformer-based Vision Models [0.8124699127636158]
We apply a modular approach of training inverse models to reconstruct input images from intermediate layers within a Detection Transformer and a Vision Transformer.<n>Our analysis illustrates how these properties emerge within the models, contributing to a deeper understanding of transformer-based vision models.
arXiv Detail & Related papers (2024-12-09T14:43:06Z) - A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships [0.5639904484784127]
Transformer-based models have transformed the landscape of natural language processing (NLP)
These models are renowned for their ability to capture long-range dependencies and contextual information.
We discuss potential research directions and applications of transformer-based models in computer vision.
arXiv Detail & Related papers (2024-08-27T16:22:18Z) - Automatic Discovery of Visual Circuits [66.99553804855931]
We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept.
We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
arXiv Detail & Related papers (2024-04-22T17:00:57Z) - Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis.
We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data.
FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z) - Manipulating Feature Visualizations with Gradient Slingshots [53.94925202421929]
Feature Visualization (FV) is a widely used technique for interpreting the concepts learned by Deep Neural Networks (DNNs)<n>We introduce a novel method, Gradient Slingshots, that enables manipulation of FV without modifying the model architecture or significantly degrading its performance.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Multimodal Adaptive Fusion of Face and Gait Features using Keyless
attention based Deep Neural Networks for Human Identification [67.64124512185087]
Soft biometrics such as gait are widely used with face in surveillance tasks like person recognition and re-identification.
We propose a novel adaptive multi-biometric fusion strategy for the dynamic incorporation of gait and face biometric cues by leveraging keyless attention deep neural networks.
arXiv Detail & Related papers (2023-03-24T05:28:35Z) - Out of Distribution Performance of State of Art Vision Model [0.0]
ViT's self-attention mechanism, according to the claim, makes it more robust than CNN.
We investigate the performance of 58 state-of-the-art computer vision models in a unified training setup.
arXiv Detail & Related papers (2023-01-25T18:14:49Z) - Peripheral Vision Transformer [52.55309200601883]
We take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition.
We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data.
We evaluate the proposed network, dubbed PerViT, on the large-scale ImageNet dataset and systematically investigate the inner workings of the model for machine perception.
arXiv Detail & Related papers (2022-06-14T12:47:47Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - Deep Features for training Support Vector Machine [16.795405355504077]
This paper develops a generic computer vision system based on features extracted from trained CNNs.
Multiple learned features are combined into a single structure to work on different image classification tasks.
arXiv Detail & Related papers (2021-04-08T03:13:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.