PP-ShiTu: A Practical Lightweight Image Recognition System
- URL: http://arxiv.org/abs/2111.00775v1
- Date: Mon, 1 Nov 2021 09:04:54 GMT
- Title: PP-ShiTu: A Practical Lightweight Image Recognition System
- Authors: Shengyu Wei, Ruoyu Guo, Cheng Cui, Bin Lu, Shuilong Dong, Tingquan
Gao, Yuning Du, Ying Zhou, Xueying Lyu, Qiwen Liu, Xiaoguang Hu, Dianhai Yu,
Yanjun Ma
- Abstract summary: We propose a practical lightweight image recognition system, named PP-ShiTu, consisting of the following 3 modules.
We introduce popular strategies including metric learning, deep hash, knowledge distillation and model quantization to improve accuracy and inference speed.
Experiments on different datasets and benchmarks show that the system is widely effective in different domains of image recognition.
- Score: 5.400569330093269
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, image recognition applications have developed rapidly. A
large number of studies and techniques have emerged in different fields, such
as face recognition, pedestrian and vehicle re-identification, landmark
retrieval, and product recognition. In this paper, we propose a practical
lightweight image recognition system, named PP-ShiTu, consisting of the
following 3 modules, mainbody detection, feature extraction and vector search.
We introduce popular strategies including metric learning, deep hash, knowledge
distillation and model quantization to improve accuracy and inference speed.
With strategies above, PP-ShiTu works well in different scenarios with a set of
models trained on a mixed dataset. Experiments on different datasets and
benchmarks show that the system is widely effective in different domains of
image recognition. All the above mentioned models are open-sourced and the code
is available in the GitHub repository PaddleClas on PaddlePaddle.
Related papers
- GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving [9.023864430027333]
multimodal place recognition has gained increasing attention due to their ability to overcome weaknesses of uni sensor systems.
We propose a 3D Gaussian-based multimodal place recognition neural network dubbed GSPR.
arXiv Detail & Related papers (2024-10-01T00:43:45Z) - MSSPlace: Multi-Sensor Place Recognition with Visual and Text Semantics [41.94295877935867]
We study the impact of leveraging a multi-camera setup and integrating diverse data sources for multimodal place recognition.
Our proposed method named MSSPlace utilizes images from multiple cameras, LiDAR point clouds, semantic segmentation masks, and text annotations to generate comprehensive place descriptors.
arXiv Detail & Related papers (2024-07-22T14:24:56Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - SiLK -- Simple Learned Keypoints [11.208547877814574]
Keypoint detection & descriptors are foundational tech-nologies for computer vision tasks like image matching, 3D reconstruction and visual odometry.
Recent learning-based methods employ a vast diversity of experimental setups and design choices.
We re-design each component from first-principle and propose Simple Learned Keypoints (SiLK) that is fully-differentiable, lightweight, and flexible.
arXiv Detail & Related papers (2023-04-12T23:56:00Z) - Improving Image Recognition by Retrieving from Web-Scale Image-Text Data [68.63453336523318]
We introduce an attention-based memory module, which learns the importance of each retrieved example from the memory.
Compared to existing approaches, our method removes the influence of the irrelevant retrieved examples, and retains those that are beneficial to the input query.
We show that it achieves state-of-the-art accuracies in ImageNet-LT, Places-LT and Webvision datasets.
arXiv Detail & Related papers (2023-04-11T12:12:05Z) - Deep Learning Computer Vision Algorithms for Real-time UAVs On-board
Camera Image Processing [77.34726150561087]
This paper describes how advanced deep learning based computer vision algorithms are applied to enable real-time on-board sensor processing for small UAVs.
All algorithms have been developed using state-of-the-art image processing methods based on deep neural networks.
arXiv Detail & Related papers (2022-11-02T11:10:42Z) - Multi-level Second-order Few-shot Learning [111.0648869396828]
We propose a Multi-level Second-order (MlSo) few-shot learning network for supervised or unsupervised few-shot image classification and few-shot action recognition.
We leverage so-called power-normalized second-order base learner streams combined with features that express multiple levels of visual abstraction.
We demonstrate respectable results on standard datasets such as Omniglot, mini-ImageNet, tiered-ImageNet, Open MIC, fine-grained datasets such as CUB Birds, Stanford Dogs and Cars, and action recognition datasets such as HMDB51, UCF101, and mini-MIT.
arXiv Detail & Related papers (2022-01-15T19:49:00Z) - Why-So-Deep: Towards Boosting Previously Trained Models for Visual Place
Recognition [12.807343105549409]
We present an intelligent method, MAQBOOL, to amplify the power of pre-trained models for better image recall.
We achieve comparable image retrieval results at a low descriptor dimension (512-D), compared to the high descriptor dimension (4096-D) of state-of-the-art methods.
arXiv Detail & Related papers (2022-01-10T08:39:06Z) - IterMVS: Iterative Probability Estimation for Efficient Multi-View
Stereo [71.84742490020611]
IterMVS is a new data-driven method for high-resolution multi-view stereo.
We propose a novel GRU-based estimator that encodes pixel-wise probability distributions of depth in its hidden state.
We verify the efficiency and effectiveness of our method on DTU, Tanks&Temples and ETH3D.
arXiv Detail & Related papers (2021-12-09T18:58:02Z) - Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition [86.31412529187243]
Few-shot video recognition aims at learning new actions with only very few labeled samples.
We propose a depth guided Adaptive Meta-Fusion Network for few-shot video recognition which is termed as AMeFu-Net.
arXiv Detail & Related papers (2020-10-20T03:06:20Z) - HDD-Net: Hybrid Detector Descriptor with Mutual Interactive Learning [24.13425816781179]
Local feature extraction remains an active research area due to the advances in fields such as SLAM, 3D reconstructions, or AR applications.
We propose a method that treats both extractions independently and focuses on their interaction in the learning process.
We show improvements over the state of the art in terms of image matching on HPatches and 3D reconstruction quality while keeping on par on camera localisation tasks.
arXiv Detail & Related papers (2020-05-12T13:55:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.