DnS: Distill-and-Select for Efficient and Accurate Video Indexing and
Retrieval
- URL: http://arxiv.org/abs/2106.13266v1
- Date: Thu, 24 Jun 2021 18:34:24 GMT
- Title: DnS: Distill-and-Select for Efficient and Accurate Video Indexing and
Retrieval
- Authors: Giorgos Kordopatis-Zilos, Christos Tzelepis, Symeon Papadopoulos,
Ioannis Kompatsiaris, Ioannis Patras
- Abstract summary: We propose a Knowledge Distillation framework, which we call Distill-and-Select (DnS)
We train several students with different architectures and arrive at different trade-offs of performance and efficiency.
Importantly, the proposed scheme allows Knowledge Distillation in large, unlabelled datasets -- this leads to good students.
- Score: 23.42790810694723
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address the problem of high performance and computationally
efficient content-based video retrieval in large-scale datasets. Current
methods typically propose either: (i) fine-grained approaches employing
spatio-temporal representations and similarity calculations, achieving high
performance at a high computational cost or (ii) coarse-grained approaches
representing/indexing videos as global vectors, where the spatio-temporal
structure is lost, providing low performance but also having low computational
cost. In this work, we propose a Knowledge Distillation framework, which we
call Distill-and-Select (DnS), that starting from a well-performing
fine-grained Teacher Network learns: a) Student Networks at different retrieval
performance and computational efficiency trade-offs and b) a Selection Network
that at test time rapidly directs samples to the appropriate student to
maintain both high retrieval performance and high computational efficiency. We
train several students with different architectures and arrive at different
trade-offs of performance and efficiency, i.e., speed and storage requirements,
including fine-grained students that store index videos using binary
representations. Importantly, the proposed scheme allows Knowledge Distillation
in large, unlabelled datasets -- this leads to good students. We evaluate DnS
on five public datasets on three different video retrieval tasks and
demonstrate a) that our students achieve state-of-the-art performance in
several cases and b) that our DnS framework provides an excellent trade-off
between retrieval performance, computational speed, and storage space. In
specific configurations, our method achieves similar mAP with the teacher but
is 20 times faster and requires 240 times less storage space. Our collected
dataset and implementation are publicly available:
https://github.com/mever-team/distill-and-select.
Related papers
- Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs [56.040198387038025]
We present a novel prompt-guided visual perception framework (abbreviated as Free Video-LLM) for efficient inference of training-free video LLMs.
Our method effectively reduces the number of visual tokens while maintaining high performance across multiple video question-answering benchmarks.
arXiv Detail & Related papers (2024-10-14T12:35:12Z) - Towards Efficient and Effective Text-to-Video Retrieval with
Coarse-to-Fine Visual Representation Learning [15.998149438353133]
We propose a two-stage retrieval architecture for text-to-video retrieval.
In training phase, we design a parameter-free text-gated interaction block (TIB) for fine-grained video representation learning.
In retrieval phase, we use coarse-grained video representations for fast recall of top-k candidates, which are then reranked by fine-grained video representations.
arXiv Detail & Related papers (2024-01-01T08:54:18Z) - Efficient Temporal Sentence Grounding in Videos with Multi-Teacher Knowledge Distillation [29.952771954087602]
Temporal Sentence Grounding in Videos (TSGV) aims to detect the event timestamps described by the natural language query from untrimmed videos.
This paper discusses the challenge of achieving efficient computation in TSGV models while maintaining high performance.
arXiv Detail & Related papers (2023-08-07T17:07:48Z) - Deep Unsupervised Key Frame Extraction for Efficient Video
Classification [63.25852915237032]
This work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC)
The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.
Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification.
arXiv Detail & Related papers (2022-11-12T20:45:35Z) - Learning Binary and Sparse Permutation-Invariant Representations for
Fast and Memory Efficient Whole Slide Image Search [3.2580463372881234]
We propose a novel framework for learning binary and sparse WSI representations utilizing a deep generative modelling and the Fisher Vector.
We introduce new loss functions for learning sparse and binary permutation-invariant WSI representations that employ instance-based training.
The proposed method outperforms Yottixel (a recent search engine for histopathology images) both in terms of retrieval accuracy and speed.
arXiv Detail & Related papers (2022-08-29T14:56:36Z) - Temporal Saliency Query Network for Efficient Video Recognition [82.52760040577864]
Video recognition is a hot-spot research topic with the explosive growth of multimedia data on the Internet and mobile devices.
Most existing methods select the salient frames without awareness of the class-specific saliency scores.
We propose a novel Temporal Saliency Query (TSQ) mechanism, which introduces class-specific information to provide fine-grained cues for saliency measurement.
arXiv Detail & Related papers (2022-07-21T09:23:34Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - AutoDistil: Few-shot Task-agnostic Neural Architecture Search for
Distilling Large Language Models [121.22644352431199]
We use Neural Architecture Search (NAS) to automatically distill several compressed students with variable cost from a large model.
Current works train a single SuperLM consisting of millions ofworks with weight-sharing.
Experiments on GLUE benchmark against state-of-the-art KD and NAS methods demonstrate AutoDistil to outperform leading compression techniques.
arXiv Detail & Related papers (2022-01-29T06:13:04Z) - Elastic Architecture Search for Diverse Tasks with Different Resources [87.23061200971912]
We study a new challenging problem of efficient deployment for diverse tasks with different resources, where the resource constraint and task of interest corresponding to a group of classes are dynamically specified at testing time.
Previous NAS approaches seek to design architectures for all classes simultaneously, which may not be optimal for some individual tasks.
We present a novel and general framework, called Elastic Architecture Search (EAS), permitting instant specializations at runtime for diverse tasks with various resource constraints.
arXiv Detail & Related papers (2021-08-03T00:54:27Z) - Efficient Model Performance Estimation via Feature Histories [27.008927077173553]
An important step in the task of neural network design is the evaluation of a model's performance.
In this work, we use the evolution history of features of a network during the early stages of training to build a proxy classifier.
We show that our method can be combined with multiple search algorithms to find better solutions to a wide range of tasks.
arXiv Detail & Related papers (2021-03-07T20:41:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.