CARE: Contrastive Alignment for ADL Recognition from Event-Triggered Sensor Streams
- URL: http://arxiv.org/abs/2510.16988v2
- Date: Thu, 30 Oct 2025 18:47:45 GMT
- Title: CARE: Contrastive Alignment for ADL Recognition from Event-Triggered Sensor Streams
- Authors: Junhao Zhao, Zishuai Liu, Ruili Fang, Jin Lu, Linghan Zhang, Fei Dou,
- Abstract summary: We propose Contrastive Alignment for ADL Recognition from Event-Triggered Sensor Streams (CARE)<n>CARE is an end-to-end framework that jointly optimize representation learning via Sequence-Image Contrastive Alignment (SICA) and classification via cross-entropy.<n>CARE achieves state-of-the-art performance (89.8% on Milan, 88.9% on Cairo, and 73.3% on Kyoto7) and demonstrates robustness to sensor malfunctions and layout variability.
- Score: 4.3359440506714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recognition of Activities of Daily Living (ADLs) from event-triggered ambient sensors is an essential task in Ambient Assisted Living, yet existing methods remain constrained by representation-level limitations. Sequence-based approaches preserve temporal order of sensor activations but are sensitive to noise and lack spatial awareness, while image-based approaches capture global patterns and implicit spatial correlations but compress fine-grained temporal dynamics and distort sensor layouts. Naive fusion (e.g., feature concatenation) fail to enforce alignment between sequence- and image-based representation views, underutilizing their complementary strengths. We propose Contrastive Alignment for ADL Recognition from Event-Triggered Sensor Streams (CARE), an end-to-end framework that jointly optimizes representation learning via Sequence-Image Contrastive Alignment (SICA) and classification via cross-entropy, ensuring both cross-representation alignment and task-specific discriminability. CARE integrates (i) time-aware, noise-resilient sequence encoding with (ii) spatially-informed and frequency-sensitive image representations, and employs (iii) a joint contrastive-classification objective for end-to-end learning of aligned and discriminative embeddings. Evaluated on three CASAS datasets, CARE achieves state-of-the-art performance (89.8% on Milan, 88.9% on Cairo, and 73.3% on Kyoto7) and demonstrates robustness to sensor malfunctions and layout variability, highlighting its potential for reliable ADL recognition in smart homes.
Related papers
- Semi-Supervised Hyperspectral Image Classification with Edge-Aware Superpixel Label Propagation and Adaptive Pseudo-Labeling [5.022329161015679]
We propose a novel semi-supervised hyperspectral classification framework integrating spatial prior information with a dynamic learning mechanism.<n>We introduce a Dynamic History-Fused Prediction (DHP) method to smoothens pseudo-label fluctuations and improves temporal consistency and noise resistance.<n>The Dynamic Reliability-Enhanced Pseudo-Label Framework (DREPL) strengthens pseudo-label stability across temporal and sample domains.
arXiv Detail & Related papers (2026-01-26T00:31:08Z) - Staged Voxel-Level Deep Reinforcement Learning for 3D Medical Image Segmentation with Noisy Annotations [4.581671524490035]
We propose an end-to-end Staged Voxel-Level Deep Reinforcement Learning framework for robust medical image segmentation under noisy annotations.<n>This framework employs a dynamic iterative update strategy to automatically mitigate the impact of erroneous labels without requiring manual intervention.
arXiv Detail & Related papers (2026-01-07T12:39:54Z) - Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment [83.56510119503265]
Slot Attention (SA) with pretrained diffusion models has recently shown promise for object-centric learning (OCL), but suffers from slot entanglement and weak alignment between object slots and image content.<n>We propose Contrastive Object-centric Diffusion Alignment (CODA), a simple extension that (i) employs register slots to absorb residual attention and reduce interference between object slots, and (ii) applies a contrastive alignment loss to explicitly encourage slot-image correspondence.
arXiv Detail & Related papers (2026-01-03T16:10:18Z) - Generalized Decoupled Learning for Enhancing Open-Vocabulary Dense Perception [71.26728044621458]
DeCLIP is a novel framework that enhances CLIP by decoupling the self-attention module to obtain content'' and context'' features respectively.<n>It consistently achieves state-of-the-art performance across a broad spectrum of tasks, including 2D detection and segmentation, 3D instance segmentation, video instance segmentation, and 6D object pose estimation.
arXiv Detail & Related papers (2025-08-15T06:43:51Z) - Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition [11.11236920942621]
Zero-shot skeleton-based action recognition aims to identify actions beyond the categories encountered during training.<n>Previous approaches have primarily focused on aligning visual and semantic representations.<n>We propose a Frequency-Semantic Enhanced Variational Autoencoder (FS-VAE) to explore the skeleton semantic representation learning with frequency decomposition.
arXiv Detail & Related papers (2025-06-27T12:44:08Z) - ADLGen: Synthesizing Symbolic, Event-Triggered Sensor Sequences for Human Activity Modeling [9.526073030523733]
ADLGen is a generative framework designed to synthesize realistic, event triggered, and symbolic sensor sequences.<n>ADLGen is shown to outperform baseline generators in verifying statistical fidelity, semantic richness, and downstream activity recognition.
arXiv Detail & Related papers (2025-05-23T14:52:48Z) - CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching [31.42896369011162]
CoMatch is a novel semi-dense image matcher with dynamic covisibility awareness and bilateral subpixel accuracy.<n>A covisibility-guided token condenser is introduced to adaptively aggregate tokens in light of their covisibility scores.<n>A fine correlation module is developed to refine the matching candidates in both source and target views to subpixel level.
arXiv Detail & Related papers (2025-03-31T10:17:01Z) - EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting [108.15136508964011]
EgoSplat is a language-embedded 3D Gaussian Splatting framework for open-vocabulary egocentric scene understanding.<n>EgoSplat achieves state-of-the-art performance in both localization and segmentation tasks on two datasets.
arXiv Detail & Related papers (2025-03-14T12:21:26Z) - Multi-Modality Driven LoRA for Adverse Condition Depth Estimation [61.525312117638116]
We propose Multi-Modality Driven LoRA (MMD-LoRA) for Adverse Condition Depth Estimation.<n>It consists of two core components: Prompt Driven Domain Alignment (PDDA) and Visual-Text Consistent Contrastive Learning (VTCCL)<n>It achieves state-of-the-art performance on the nuScenes and Oxford RobotCar datasets.
arXiv Detail & Related papers (2024-12-28T14:23:58Z) - DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models [18.342569823885864]
3D LiDAR sensors are widely used to capture sparse point clouds of the vehicle's surroundings.<n>Such systems struggle to perceive occluded areas and gaps in the scene due to the sparsity of these point clouds and their lack of semantics.<n>We jointly predict unobserved geometry and semantics in the scene given raw LiDAR measurements, aiming for a more complete scene representation.<n>We evaluate our approach on autonomous driving datasets, and it achieves state-of-the-art performance for SSC, surpassing most existing methods.
arXiv Detail & Related papers (2024-09-26T17:39:05Z) - Graph-Aware Contrasting for Multivariate Time-Series Classification [50.84488941336865]
Existing contrastive learning methods mainly focus on achieving temporal consistency with temporal augmentation and contrasting techniques.
We propose Graph-Aware Contrasting for spatial consistency across MTS data.
Our proposed method achieves state-of-the-art performance on various MTS classification tasks.
arXiv Detail & Related papers (2023-09-11T02:35:22Z) - UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision
Transformer for Face Forgery Detection [52.91782218300844]
We propose a novel Unsupervised Inconsistency-Aware method based on Vision Transformer, called UIA-ViT.
Due to the self-attention mechanism, the attention map among patch embeddings naturally represents the consistency relation, making the vision Transformer suitable for the consistency representation learning.
arXiv Detail & Related papers (2022-10-23T15:24:47Z) - Unsupervised Foggy Scene Understanding via Self Spatial-Temporal Label
Diffusion [51.11295961195151]
We exploit the characteristics of the foggy image sequence of driving scenes to densify the confident pseudo labels.
Based on the two discoveries of local spatial similarity and adjacent temporal correspondence of the sequential image data, we propose a novel Target-Domain driven pseudo label Diffusion scheme.
Our scheme helps the adaptive model achieve 51.92% and 53.84% mean intersection-over-union (mIoU) on two publicly available natural foggy datasets.
arXiv Detail & Related papers (2022-06-10T05:16:50Z) - Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA)
IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors.
IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.