Fugu-MT 論文翻訳(概要): Sense Less, Infer More: Agentic Multimodal Transformers for Edge Medical Intelligence

論文の概要: Sense Less, Infer More: Agentic Multimodal Transformers for Edge Medical Intelligence

arxiv url: http://arxiv.org/abs/2604.10404v1
Date: Sun, 12 Apr 2026 01:46:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:15.996153
Title: Sense Less, Infer More: Agentic Multimodal Transformers for Edge Medical Intelligence
Title（参考訳）: エッジ・メディカル・インテリジェンスのためのエージェント・マルチモーダル・トランスフォーマー
Authors: Chengwei Zhou, Zhaoyan Jia, Haotian Yu, Xuming Chen, Brandon Lee, Christopher Pulliam, Steve Majerus, Massoud Pedram, Gourav Datta,
Abstract要約: 適応型マルチモーダルインテリジェンス(AMI:Adaptive Multimodal Intelligence)は、いつ、どのように推論するかを共同で学習するエンドツーエンドフレームワークである。 AMIは,(1)Gumbel-Sigmoid Gatingを用いてモデル信頼性とタスク関連性に基づいて動的にアクティブなセンサを選択できる軽量なエージェントモードコントローラ,(2)時間的に冗長なサンプルをスキップするための学習可能なしきい値を持つパッチワイズデルタシグマ演算を適用可能な学習Sigma-Delta Sensingモジュール,(3)非モーダルファウンデーションエンコーダと時間的コンテキストを持つクロスモーダルトランスフォーマーをベースとしたファンデーションベースのマルチモーダル予測モデル,の3つのコンポーネントを統合した。
参考スコア（独自算出の注目度）: 11.75125432258758
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Edge-based multimodal medical monitoring requires models that balance diagnostic accuracy with severe energy constraints. Continuous acquisition of ECG, PPG, EMG, and IMU streams rapidly drains wearable batteries, often limiting operation to under 10 hours, while existing systems overlook the high temporal redundancy present in physiological signals. We introduce Adaptive Multimodal Intelligence (AMI), an end-to-end framework that jointly learns when to sense and how to infer. AMI integrates three components: (1) a lightweight Agentic Modality Controller that uses differentiable Gumbel-Sigmoid gating to dynamically select active sensors based on model confidence and task relevance; (2) a Learned Sigma-Delta Sensing module that applies patch-wise Delta-Sigma operations with learnable thresholds to skip temporally redundant samples; and (3) a Foundation-backed Multimodal Prediction Model built on unimodal foundation encoders and a cross-modal transformer with temporal context, enabling robust fusion even under gated or missing inputs. These components are trained jointly via a multi-objective loss combining classification accuracy, sparsity regularization, cross-modal alignment, and predictive coding. AMI is hardware-aware, supporting dynamic computation graphs and masked operations, leading to real energy and latency savings. Across MHEALTH, HMC Sleep, and WESAD datasets, it reduces sensor usage by 48.8% while improving state-of-the-art accuracy by 1.9% on average.
Abstract（参考訳）: エッジベースのマルチモーダル医療モニタリングは、診断精度と厳しいエネルギー制約のバランスをとるモデルを必要とする。 ECG, PPG, EMG, IMU ストリームの連続的取得は、しばしば10時間未満の動作に制限されるが、既存のシステムは生理的信号に高い時間的冗長性を見落としている。適応型マルチモーダルインテリジェンス(AMI:Adaptive Multimodal Intelligence)は、いつ、どのように推論するかを共同で学習するエンドツーエンドフレームワークである。 AMIは,(1)モデル信頼性とタスク関連性に基づいて動的にアクティブなセンサを選択可能なGumbel-Sigmoidゲーティングを用いた軽量なエージェントモードコントローラ,(2)時間的に冗長なサンプルをスキップするパッチワイズデルタシグマ演算を応用した学習済みSigma-Delta Sensingモジュール,(3)非モーダルファウンデーションエンコーダと時間的コンテキストを持つクロスモーダルトランスフォーマーをベースとしたファンデーションベースのマルチモーダル予測モデル,の3つのコンポーネントを統合した。これらのコンポーネントは、分類精度、スパーシティ正規化、クロスモーダルアライメント、予測符号化を組み合わせた多目的損失によって共同で訓練される。 AMIはハードウェア対応で、動的計算グラフとマスキング操作をサポートし、実際のエネルギとレイテンシの削減につながる。 MHEALTH、HMC Sleep、WASADデータセット全体では、センサーの使用量を48.8%削減し、最先端の精度を平均1.9%改善している。

論文の概要: Sense Less, Infer More: Agentic Multimodal Transformers for Edge Medical Intelligence

関連論文リスト