Explored An Effective Methodology for Fine-Grained Snake Recognition
- URL: http://arxiv.org/abs/2207.11637v1
- Date: Sun, 24 Jul 2022 02:19:15 GMT
- Title: Explored An Effective Methodology for Fine-Grained Snake Recognition
- Authors: Yong Huang, Aderon Huang, Wei Zhu, Yanming Fang, Jinghua Feng
- Abstract summary: We design a strong multimodal backbone to utilize various meta-information to assist in fine-grained identification.
In order to take full advantage of unlabeled datasets, we use self-supervised learning and supervised learning joint training.
Our method can achieve a macro f1 score 92.7% and 89.4% on private and public dataset, respectively, which is the 1st place among the participators on private leaderboard.
- Score: 8.908667065576632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-Grained Visual Classification (FGVC) is a longstanding and fundamental
problem in computer vision and pattern recognition, and underpins a diverse set
of real-world applications. This paper describes our contribution at
SnakeCLEF2022 with FGVC. Firstly, we design a strong multimodal backbone to
utilize various meta-information to assist in fine-grained identification.
Secondly, we provide new loss functions to solve the long tail distribution
with dataset. Then, in order to take full advantage of unlabeled datasets, we
use self-supervised learning and supervised learning joint training to provide
pre-trained model. Moreover, some effective data process tricks also are
considered in our experiments. Last but not least, fine-tuned in downstream
task with hard mining, ensambled kinds of model performance. Extensive
experiments demonstrate that our method can effectively improve the performance
of fine-grained recognition. Our method can achieve a macro f1 score 92.7% and
89.4% on private and public dataset, respectively, which is the 1st place among
the participators on private leaderboard.
Related papers
- Reinforcement Learning Based Multi-modal Feature Fusion Network for
Novel Class Discovery [47.28191501836041]
In this paper, we employ a Reinforcement Learning framework to simulate the cognitive processes of humans.
We also deploy a Member-to-Leader Multi-Agent framework to extract and fuse features from multi-modal information.
We demonstrate the performance of our approach in both the 3D and 2D domains by employing the OS-MN40, OS-MN40-Miss, and Cifar10 datasets.
arXiv Detail & Related papers (2023-08-26T07:55:32Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss.
We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Fine-Grained Adversarial Semi-supervised Learning [25.36956660025102]
We exploit Semi-Supervised Learning (SSL) to increase the amount of training data to improve the performance of Fine-Grained Visual Categorization (FGVC)
We demonstrate the effectiveness of the combined use by conducting experiments on six state-of-the-art fine-grained datasets.
arXiv Detail & Related papers (2021-10-12T09:24:22Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - SelfHAR: Improving Human Activity Recognition through Self-training with
Unlabeled Data [9.270269467155547]
SelfHAR is a semi-supervised model that learns to leverage unlabeled datasets to complement small labeled datasets.
Our approach combines teacher-student self-training, which distills the knowledge of unlabeled and labeled datasets.
SelfHAR is data-efficient, reaching similar performance using up to 10 times less labeled data compared to supervised approaches.
arXiv Detail & Related papers (2021-02-11T15:40:35Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.