Related papers: EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception

EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception

URL: http://arxiv.org/abs/2506.21080v1
Date: Thu, 26 Jun 2025 08:09:16 GMT
Title: EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception
Authors: Sanjoy Chowdhury, Subrata Biswas, Sayan Nag, Tushar Nagarajan, Calvin Murdock, Ishwarya Ananthabhotla, Yijun Qian, Vamsi Krishna Ithapu, Dinesh Manocha, Ruohan Gao,
Abstract summary: EgoAdapt is a framework that adaptively performs cross-modal distillation and policy learning to enable efficient inference across egocentric perception tasks.<n>Our results show that EgoAdapt significantly enhances efficiency, reducing GMACs by up to 89.09%, parameters up to 82.02%, and energy up to 9.6x.
Score: 59.93605371289108
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern perception models, particularly those designed for multisensory egocentric tasks, have achieved remarkable performance but often come with substantial computational costs. These high demands pose challenges for real-world deployment, especially in resource-constrained environments. In this paper, we introduce EgoAdapt, a framework that adaptively performs cross-modal distillation and policy learning to enable efficient inference across different egocentric perception tasks, including egocentric action recognition, active speaker localization, and behavior anticipation. Our proposed policy module is adaptable to task-specific action spaces, making it broadly applicable. Experimental results on three challenging egocentric datasets EPIC-Kitchens, EasyCom, and Aria Everyday Activities demonstrate that our method significantly enhances efficiency, reducing GMACs by up to 89.09%, parameters up to 82.02%, and energy up to 9.6x, while still on-par and in many cases outperforming, the performance of corresponding state-of-the-art models.

Related papers

Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement [101.77467538102924]
Large reasoning models (LRMs) exhibit overthinking, which hinders efficiency and inflates inference cost.<n>We propose two lightweight methods to enhance LRM efficiency.<n>First, we introduce Efficiency Steering, a training-free activation steering technique that modulates reasoning behavior via a single direction.<n>Second, we develop Self-Rewarded Efficiency RL, a reinforcement learning framework that dynamically balances task accuracy and brevity.
arXiv Detail & Related papers (2025-06-18T17:18:12Z)
Knowledge Distillation for Multimodal Egocentric Action Recognition Robust to Missing Modalities [43.15852057358654]
We introduce an efficient multimodal knowledge distillation approach for egocentric action recognition.<n>Our method focuses on resource-efficient development by leveraging pre-trained models as unimodal feature extractors in our teacher model.
arXiv Detail & Related papers (2025-04-11T14:30:42Z)
Meta-Reinforcement Learning with Discrete World Models for Adaptive Load Balancing [0.0]
We integrate a meta-reinforcement learning algorithm with the DreamerV3 architecture to improve load balancing in operating systems.<n>This approach enables rapid adaptation to dynamic workloads with minimal retraining, outperforming the Advantage Actor-Critic (A2C) algorithm in standard and adaptive trials.
arXiv Detail & Related papers (2025-03-11T20:36:49Z)
Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition [22.615830919860777]
This paper presents an efficient visual recognition paradigm, called Dynamic Adapter (Dyn-Adapter) We devise a dynamic architecture with balanced early heads for multi-level feature extraction, along with adaptive training strategy. We reduce FLOPs during inference by 50%, while maintaining or even yielding higher recognition accuracy.
arXiv Detail & Related papers (2024-07-19T13:33:38Z)
RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently.<n>Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
Exploring Missing Modality in Multimodal Egocentric Datasets [89.76463983679058]
We introduce a novel concept -Missing Modality Token (MMT)-to maintain performance even when modalities are absent. Our method mitigates the performance loss, reducing it from its original $sim 30%$ drop to only $sim 10%$ when half of the test set is modal-incomplete.
arXiv Detail & Related papers (2024-01-21T11:55:42Z)
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion [4.716845031095804]
Transformer models can face practical limitations due to their high computational requirements. Such models exhibit significant activation sparsity, which can be leveraged to reduce the inference cost by converting parts of the network into equivalent Mixture-of-Experts (MoE) layers. We demonstrate that the efficiency of the conversion can be significantly enhanced by a proper regularization of the activation sparsity of the base model.
arXiv Detail & Related papers (2023-10-06T16:34:51Z)
Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation [102.24108167002252]
We propose a novel attention network, named self-modulating attention, that models the complex and non-linearly evolving dynamic user preferences. We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three large-scale real-world datasets show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-03-30T03:54:11Z)
Online reinforcement learning with sparse rewards through an active inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future. Our model is capable of solving sparse-reward problems with a very high sample efficiency. We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.