Related papers: Active Acquisition for Multimodal Temporal Data: A Challenging Decision-Making Task

Active Acquisition for Multimodal Temporal Data: A Challenging Decision-Making Task

URL: http://arxiv.org/abs/2211.05039v2
Date: Mon, 3 Jul 2023 14:47:18 GMT
Title: Active Acquisition for Multimodal Temporal Data: A Challenging Decision-Making Task
Authors: Jannik Kossen, C\u{a}t\u{a}lina Cangea, Eszter V\'ertes, Andrew Jaegle, Viorica Patraucean, Ira Ktena, Nenad Tomasev, Danielle Belgrave
Abstract summary: We introduce a challenging decision-making task that we call active acquisition for multimodal temporal data (A2MT) We aim to learn agents that actively select which modalities of an input to acquire, trading off acquisition cost and predictive performance. Applications of A2MT may be impactful in domains like medicine, robotics, or finance, where modalities differ in acquisition cost and informativeness.
Score: 13.291343999247898
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a challenging decision-making task that we call active acquisition for multimodal temporal data (A2MT). In many real-world scenarios, input features are not readily available at test time and must instead be acquired at significant cost. With A2MT, we aim to learn agents that actively select which modalities of an input to acquire, trading off acquisition cost and predictive performance. A2MT extends a previous task called active feature acquisition to temporal decision making about high-dimensional inputs. We propose a method based on the Perceiver IO architecture to address A2MT in practice. Our agents are able to solve a novel synthetic scenario requiring practically relevant cross-modal reasoning skills. On two large-scale, real-world datasets, Kinetics-700 and AudioSet, our agents successfully learn cost-reactive acquisition behavior. However, an ablation reveals they are unable to learn adaptive acquisition strategies, emphasizing the difficulty of the task even for state-of-the-art models. Applications of A2MT may be impactful in domains like medicine, robotics, or finance, where modalities differ in acquisition cost and informativeness.

Related papers

Enhancing Cross-task Transfer of Large Language Models via Activation Steering [75.41750053623298]
Cross-task in-context learning offers a direct solution for transferring knowledge across tasks.<n>We investigate whether cross-task transfer can be achieved via latent space steering without parameter updates or input expansion.<n>We propose a novel Cross-task Activation Steering Transfer framework that enables effective transfer by manipulating the model's internal activation states.
arXiv Detail & Related papers (2025-07-17T15:47:22Z)
NOCTA: Non-Greedy Objective Cost-Tradeoff Acquisition for Longitudinal Data [23.75715594365611]
We propose NOCTA, a Non-Greedy Objective Cost-Tradeoff Acquisition method.<n>We first introduce a cohesive estimation target for our NOCTA setting, and then develop two complementary estimators.<n>Experiments on synthetic and real-world medical datasets demonstrate that both NOCTA variants outperform existing baselines.
arXiv Detail & Related papers (2025-07-16T17:00:41Z)
CAML: Collaborative Auxiliary Modality Learning for Multi-Agent Systems [38.20651868834145]
Collaborative Auxiliary Modality Learning ($textbfCAML$) is a novel multi-agent multi-modality framework. It enables agents to collaborate and share multimodal data during training while allowing inference with reduced modalities per agent during testing.
arXiv Detail & Related papers (2025-02-25T03:59:40Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
ModalPrompt:Dual-Modality Guided Prompt for Continual Learning of Large Multimodal Models [40.7613157799378]
Large Multimodal Models (LMMs) exhibit remarkable multi-tasking ability by learning mixed datasets jointly. Existing methods leverage data replay or model expansion, both of which are not specially developed for LMMs. We propose a novel dual-modality guided prompt learning framework (ModalPrompt) tailored for multimodal continual learning.
arXiv Detail & Related papers (2024-10-08T09:35:37Z)
Combating Missing Modalities in Egocentric Videos at Test Time [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues. We propose a novel approach to address this issue at test time without requiring retraining. MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z)
Exploring Missing Modality in Multimodal Egocentric Datasets [89.76463983679058]
We introduce a novel concept -Missing Modality Token (MMT)-to maintain performance even when modalities are absent. Our method mitigates the performance loss, reducing it from its original $sim 30%$ drop to only $sim 10%$ when half of the test set is modal-incomplete.
arXiv Detail & Related papers (2024-01-21T11:55:42Z)
Learning Computational Efficient Bots with Costly Features [9.39143793228343]
We propose a generic offline learning approach where the computation cost of the input features is taken into account. We demonstrate the effectiveness of our method on several tasks, including D4RL benchmarks and complex 3D environments similar to those found in video games.
arXiv Detail & Related papers (2023-08-18T15:43:31Z)
High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning [112.51498431119616]
This paper studies efficient representation learning for high-modality scenarios involving a large set of diverse modalities. A single model, HighMMT, scales up to 10 modalities (text, image, audio, video, sensors, proprioception, speech, time-series, sets, and tables) and 15 tasks from 5 research areas.
arXiv Detail & Related papers (2022-03-02T18:56:20Z)
Single-Modal Entropy based Active Learning for Visual Question Answering [75.1682163844354]
We address Active Learning in the multi-modal setting of Visual Question Answering (VQA) In light of the multi-modal inputs, image and question, we propose a novel method for effective sample acquisition. Our novel idea is simple to implement, cost-efficient, and readily adaptable to other multi-modal tasks.
arXiv Detail & Related papers (2021-10-21T05:38:45Z)
Hierarchical Few-Shot Imitation with Skill Transition Models [66.81252581083199]
Few-shot Imitation with Skill Transition Models (FIST) is an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks. We show that FIST is capable of generalizing to new tasks and substantially outperforms prior baselines in navigation experiments.
arXiv Detail & Related papers (2021-07-19T15:56:01Z)
Reinforcement Learning with Efficient Active Feature Acquisition [59.91808801541007]
In real-life, information acquisition might correspond to performing a medical test on a patient. We propose a model-based reinforcement learning framework that learns an active feature acquisition policy. Key to the success is a novel sequential variational auto-encoder that learns high-quality representations from partially observed states.
arXiv Detail & Related papers (2020-11-02T08:46:27Z)
Active Feature Acquisition with Generative Surrogate Models [11.655069211977464]
In this work, we consider models that perform active feature acquisition (AFA) and query the environment for unobserved features. Our work reformulates the Markov decision process (MDP) that underlies the AFA problem as a generative modeling task. We propose learning a generative surrogate model ( GSM) that captures the dependencies among input features to assess potential information gain from acquisitions.
arXiv Detail & Related papers (2020-10-06T02:10:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.