AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
- URL: http://arxiv.org/abs/2105.05165v2
- Date: Wed, 12 May 2021 17:49:10 GMT
- Title: AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
- Authors: Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko,
Aude Oliva, Rogerio Feris
- Abstract summary: We propose an adaptive multi-modal learning framework, called AdaMML, that selects on-the-fly the optimal modalities for each segment conditioned on the input for efficient video recognition.
We show that our proposed approach yields 35%-55% reduction in computation when compared to the traditional baseline.
- Score: 61.51188561808917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal learning, which focuses on utilizing various modalities to
improve the performance of a model, is widely used in video recognition. While
traditional multi-modal learning offers excellent recognition results, its
computational expense limits its impact for many real-world applications. In
this paper, we propose an adaptive multi-modal learning framework, called
AdaMML, that selects on-the-fly the optimal modalities for each segment
conditioned on the input for efficient video recognition. Specifically, given a
video segment, a multi-modal policy network is used to decide what modalities
should be used for processing by the recognition model, with the goal of
improving both accuracy and efficiency. We efficiently train the policy network
jointly with the recognition model using standard back-propagation. Extensive
experiments on four challenging diverse datasets demonstrate that our proposed
adaptive approach yields 35%-55% reduction in computation when compared to the
traditional baseline that simply uses all the modalities irrespective of the
input, while also achieving consistent improvements in accuracy over the
state-of-the-art methods.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.