Related papers: Beyond Hearing: Learning Task-agnostic ExG Representations from Earphones via Physiology-informed Tokenization

Beyond Hearing: Learning Task-agnostic ExG Representations from Earphones via Physiology-informed Tokenization

URL: http://arxiv.org/abs/2510.20853v1
Date: Wed, 22 Oct 2025 05:11:02 GMT
Title: Beyond Hearing: Learning Task-agnostic ExG Representations from Earphones via Physiology-informed Tokenization
Authors: Hyungjun Yoon, Seungjoo Lee, Yu Yvonne Wu, Xiaomeng Chen, Taiting Lu, Freddy Yifei Liu, Taeckyung Lee, Hyeongheon Cha, Haochen Zhao, Gaoteng Zhao, Sung-Ju Lee, Cecilia Mascolo, Dongyao Chen, Lili Qiu,
Abstract summary: We introduce an approach for scalable, task-agnostic ExG monitoring in the wild.<n>At the core of our approach is Physiology-informed Multi-band Tokenization (PiMT), which decomposes ExG signals into 12 physiology-informed tokens.<n>Experiments on our new DailySense dataset-the first to enable ExG-based analysis across five human senses-demonstrate that PiMT consistently outperforms state-of-the-art methods across diverse tasks.
Score: 24.30887065380288
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Electrophysiological (ExG) signals offer valuable insights into human physiology, yet building foundation models that generalize across everyday tasks remains challenging due to two key limitations: (i) insufficient data diversity, as most ExG recordings are collected in controlled labs with bulky, expensive devices; and (ii) task-specific model designs that require tailored processing (i.e., targeted frequency filters) and architectures, which limit generalization across tasks. To address these challenges, we introduce an approach for scalable, task-agnostic ExG monitoring in the wild. We collected 50 hours of unobtrusive free-living ExG data with an earphone-based hardware prototype to narrow the data diversity gap. At the core of our approach is Physiology-informed Multi-band Tokenization (PiMT), which decomposes ExG signals into 12 physiology-informed tokens, followed by a reconstruction task to learn robust representations. This enables adaptive feature recognition across the full frequency spectrum while capturing task-relevant information. Experiments on our new DailySense dataset-the first to enable ExG-based analysis across five human senses-together with four public ExG benchmarks, demonstrate that PiMT consistently outperforms state-of-the-art methods across diverse tasks.

Related papers

Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis [14.922065513695294]
Resp-Agent is an autonomous multimodal system orchestrated by a novel Active Adrial Curriculum Agent (Thinker-A$2$CA)<n>To address the representation gap, we introduce a Modality-Weaving Diagnoser that weaves EHR data with audio tokens via Strategic Global Attention.<n>To address the data gap, we design a Flow Matching Generator that adapts a text-only Large Language Model (LLM) via modality injection.
arXiv Detail & Related papers (2026-02-16T14:48:24Z)
FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis [92.23551599659186]
Time series analysis plays a vital role in fields such as finance, healthcare, industry, and meteorology.<n>FusAD is a unified analysis framework designed for diverse time series tasks.
arXiv Detail & Related papers (2025-12-16T04:34:27Z)
FAIM: Frequency-Aware Interactive Mamba for Time Series Classification [87.84511960413715]
Time series classification (TSC) is crucial in numerous real-world applications, such as environmental monitoring, medical diagnosis, and posture recognition.<n>We propose FAIM, a lightweight Frequency-Aware Interactive Mamba model.<n>We show that FAIM consistently outperforms existing state-of-the-art (SOTA) methods, achieving a superior trade-off between accuracy and efficiency.
arXiv Detail & Related papers (2025-11-26T08:36:33Z)
SlotFM: A Motion Foundation Model with Slot Attention for Diverse Downstream Tasks [2.0906347671401018]
We present SlotFM, an accelerometer foundation model that generalizes across diverse downstream tasks.<n>We evaluate SlotFM on 16 classification and regression downstream tasks that extend beyond standard human activity recognition.
arXiv Detail & Related papers (2025-09-25T22:41:43Z)
A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications [77.3888788549565]
We present EchoCare, a novel ultrasound foundation model for generalist clinical use.<n>We developed EchoCare via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData.<n>With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks.
arXiv Detail & Related papers (2025-09-15T10:05:31Z)
Vision-G1: Towards General Vision Language Reasoning with Multi-Domain Data Curation [64.23194519770897]
We build a comprehensive RL-ready visual reasoning dataset from 46 data sources across 8 dimensions.<n>We propose an influence function based data selection and difficulty based filtering strategy to identify high-quality training samples from this dataset.<n>We train the VLM, referred to as Vision-G1, using multi-round RL with a data curriculum to iteratively improve its visual reasoning capabilities.
arXiv Detail & Related papers (2025-08-18T07:24:33Z)
Scaling Wearable Foundation Models [54.93979158708164]
We investigate the scaling properties of sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM. Our results establish the scaling laws of LSM for tasks such as imputation, extrapolation, both across time and sensor modalities.
arXiv Detail & Related papers (2024-10-17T15:08:21Z)
PhysMLE: Generalizable and Priors-Inclusive Multi-task Remote Physiological Measurement [24.424510759648072]
This paper presents an end-to-end Mixture of Low-rank Experts for multi-task remote Physiological measurement (PhysMLE) PhysMLE is based on multiple low-rank experts with a novel router mechanism, enabling the model to adeptly handle both specifications and correlations within tasks. For fair and comprehensive evaluations, this paper proposed a large-scale multi-task generalization benchmark, named Multi-Source Synsemantic Domain Generalization protocol.
arXiv Detail & Related papers (2024-05-10T02:36:54Z)
Convolutional Monge Mapping Normalization for learning on sleep data [63.22081662149488]
We propose a new method called Convolutional Monge Mapping Normalization (CMMN) CMMN consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data. Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture.
arXiv Detail & Related papers (2023-05-30T08:24:01Z)
Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction. One of the main challenges in SER is data scarcity. We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z)
Self-supervised transfer learning of physiological representations from free-living wearable data [12.863826659440026]
We present a novel self-supervised representation learning method using activity and heart rate (HR) signals without semantic labels. We evaluate our model in the largest free-living combined-sensing dataset (comprising >280k hours of wrist accelerometer & wearable ECG data)
arXiv Detail & Related papers (2020-11-18T23:21:34Z)
Learning Generalizable Physiological Representations from Large-scale Wearable Data [12.863826659440026]
We present a novel self-supervised representation learning method using activity and heart rate (HR) signals without semantic labels. We show that the resulting embeddings can generalize in various downstream tasks through transfer learning with linear classifiers. Overall, we propose the first multimodal self-supervised method for behavioral and physiological data with implications for large-scale health and lifestyle monitoring.
arXiv Detail & Related papers (2020-11-09T17:56:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.