Symmetric Multi-Similarity Loss for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2024
- URL: http://arxiv.org/abs/2406.12256v1
- Date: Tue, 18 Jun 2024 04:10:20 GMT
- Title: Symmetric Multi-Similarity Loss for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2024
- Authors: Xiaoqi Wang, Yi Wang, Lap-Pui Chau,
- Abstract summary: We present our champion solution for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge in CVPR 2024.
This challenge differs from traditional visual-text retrieval tasks by providing a correlation matrix.
We propose a novel loss function, Symmetric Multi-Similarity Loss, which offers a more precise learning objective.
- Score: 17.622013322533423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this report, we present our champion solution for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge in CVPR 2024. Essentially, this challenge differs from traditional visual-text retrieval tasks by providing a correlation matrix that acts as a set of soft labels for video-text clip combinations. However, existing loss functions have not fully exploited this information. Motivated by this, we propose a novel loss function, Symmetric Multi-Similarity Loss, which offers a more precise learning objective. Together with tricks and ensemble learning, the model achieves 63.76% average mAP and 74.25% average nDCG on the public leaderboard, demonstrating the effectiveness of our approach. Our code will be released at: https://github.com/xqwang14/SMS-Loss/tree/main
Related papers
- CUCL: Codebook for Unsupervised Continual Learning [129.91731617718781]
The focus of this study is on Unsupervised Continual Learning (UCL), as it presents an alternative to Supervised Continual Learning.
We propose a method named Codebook for Unsupervised Continual Learning (CUCL) which promotes the model to learn discriminative features to complete the class boundary.
Our method significantly boosts the performances of supervised and unsupervised methods.
arXiv Detail & Related papers (2023-11-25T03:08:50Z) - Solution for SMART-101 Challenge of ICCV Multi-modal Algorithmic
Reasoning Task 2023 [13.326745559876558]
We present our solution to a Multi-modal Algorithmic Reasoning Task: SMART-101 Challenge.
This challenge evaluates the abstraction, deduction, and generalization abilities of neural networks in solving visuolinguistic puzzles.
Under the puzzle splits configuration, we achieved an accuracy score of 26.5 on the validation set and 24.30 on the private test set.
arXiv Detail & Related papers (2023-10-10T09:12:27Z) - Lightweight Boosting Models for User Response Prediction Using
Adversarial Validation [2.4040470282119983]
The ACM RecSys Challenge 2023, organized by ShareChat, aims to predict the probability of the app being installed.
This paper describes the lightweight solution to this challenge.
arXiv Detail & Related papers (2023-10-05T13:57:05Z) - 1st Place Solution of The Robust Vision Challenge (RVC) 2022 Semantic
Segmentation Track [67.56316745239629]
This report describes the winning solution to the semantic segmentation task of the Robust Vision Challenge on ECCV 2022.
Our method adopts the FAN-B-Hybrid model as the encoder and uses Segformer as the segmentation framework.
The proposed method could serve as a strong baseline for the multi-domain segmentation task and benefit future works.
arXiv Detail & Related papers (2022-10-23T20:52:22Z) - Exploiting Semantic Role Contextualized Video Features for
Multi-Instance Text-Video Retrieval EPIC-KITCHENS-100 Multi-Instance
Retrieval Challenge 2022 [72.12974259966592]
We present our approach for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022.
We first parse sentences into semantic roles corresponding to verbs and nouns, then utilize self-attentions to exploit semantic role contextualized video features.
arXiv Detail & Related papers (2022-06-29T03:24:43Z) - SwAMP: Swapped Assignment of Multi-Modal Pairs for Cross-Modal Retrieval [15.522964295287425]
We propose a novel loss function that is based on self-labeling of the unknown classes.
We tested our approach on several real-world cross-modal retrieval problems, including text-based video retrieval, sketch-based image retrieval, and image-text retrieval.
arXiv Detail & Related papers (2021-11-10T17:17:09Z) - MAGNeto: An Efficient Deep Learning Method for the Extractive Tags
Summarization Problem [0.0]
We study a new image annotation task named Extractive Tags Summarization (ETS)
The goal is to extract important tags from the context lying in an image and its corresponding tags.
Our proposed solution consists of different widely used blocks like convolutional and self-attention layers.
arXiv Detail & Related papers (2020-11-09T11:34:21Z) - Tracklets Predicting Based Adaptive Graph Tracking [51.352829280902114]
We present an accurate and end-to-end learning framework for multi-object tracking, namely textbfTPAGT.
It re-extracts the features of the tracklets in the current frame based on motion predicting, which is the key to solve the problem of features inconsistent.
arXiv Detail & Related papers (2020-10-18T16:16:49Z) - Video Understanding as Machine Translation [53.59298393079866]
We tackle a wide variety of downstream video understanding tasks by means of a single unified framework.
We report performance gains over the state-of-the-art on several downstream tasks including video classification (EPIC-Kitchens), question answering (TVQA), captioning (TVC, YouCook2, and MSR-VTT)
arXiv Detail & Related papers (2020-06-12T14:07:04Z) - ResNeSt: Split-Attention Networks [86.25490825631763]
We present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations.
Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification.
arXiv Detail & Related papers (2020-04-19T20:40:31Z) - Triplet Online Instance Matching Loss for Person Re-identification [14.233828198522266]
We propose a Triplet Online Instance Matching (TOIM) loss function, which lays emphasis on the hard samples and improves the accuracy of person ReID effectively.
It combines the advantages of OIM loss and Triplet loss and simplifies the process of batch construction.
It can be trained on-line when handle the joint detection and identification task.
arXiv Detail & Related papers (2020-02-24T21:55:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.