Contrastive Learning with Auxiliary User Detection for Identifying Activities
- URL: http://arxiv.org/abs/2410.21300v1
- Date: Mon, 21 Oct 2024 09:04:23 GMT
- Title: Contrastive Learning with Auxiliary User Detection for Identifying Activities
- Authors: Wen Ge, Guanyi Mou, Emmanuel O. Agu, Kyumin Lee,
- Abstract summary: We argue that addressing the impact of innate user action-performing differences is equally crucial as considering external contextual environment settings.
We introduce CLAUDIA, a novel framework designed to address these issues.
Evaluation across three real-world CA-HAR datasets reveals substantial performance enhancements.
- Score: 2.8132886759540146
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human Activity Recognition (HAR) is essential in ubiquitous computing, with far-reaching real-world applications. While recent SOTA HAR research has demonstrated impressive performance, some key aspects remain under-explored. Firstly, HAR can be both highly contextualized and personalized. However, prior work has predominantly focused on being Context-Aware (CA) while largely ignoring the necessity of being User-Aware (UA). We argue that addressing the impact of innate user action-performing differences is equally crucial as considering external contextual environment settings in HAR tasks. Secondly, being user-aware makes the model acknowledge user discrepancies but does not necessarily guarantee mitigation of these discrepancies, i.e., unified predictions under the same activities. There is a need for a methodology that explicitly enforces closer (different user, same activity) representations. To bridge this gap, we introduce CLAUDIA, a novel framework designed to address these issues. Specifically, we expand the contextual scope of the CA-HAR task by integrating User Identification (UI) within the CA-HAR framework, jointly predicting both CA-HAR and UI in a new task called User and Context-Aware HAR (UCA-HAR). This approach enriches personalized and contextual understanding by jointly learning user-invariant and user-specific patterns. Inspired by SOTA designs in the visual domain, we introduce a supervised contrastive loss objective on instance-instance pairs to enhance model efficacy and improve learned feature quality. Evaluation across three real-world CA-HAR datasets reveals substantial performance enhancements, with average improvements ranging from 5.8% to 14.1% in Matthew's Correlation Coefficient and 3.0% to 7.2% in Macro F1 score.
Related papers
- Embedded Inter-Subject Variability in Adversarial Learning for Inertial Sensor-Based Human Activity Recognition [9.165849342869407]
This paper addresses the problem of Human Activity Recognition (HAR) using data from wearable inertial sensors.<n>An important challenge in HAR is the model's generalization capabilities due to inter-subject variability.<n>We propose a novel deep adversarial framework that integrates the concept of inter-subject variability in the adversarial task.
arXiv Detail & Related papers (2026-03-05T16:57:15Z) - On-device Large Multi-modal Agent for Human Activity Recognition [1.9342524451932614]
Human Activity Recognition (HAR) has been an active area of research, with applications ranging from healthcare to smart environments.<n>Recent advancements in Large Language Models (LLMs) have opened new possibilities to leverage their capabilities in HAR.<n>We present a Large Multi-Modal Agent designed for HAR, which integrates the power of LLMs to enhance both performance and user engagement.
arXiv Detail & Related papers (2025-12-17T22:05:05Z) - Leveraging Scene Context with Dual Networks for Sequential User Behavior Modeling [58.72480539725212]
We propose a novel Dual Sequence Prediction networks (DSPnet) to capture the dynamic interests and interplay between scenes and items for future behavior prediction.<n>DSPnet consists of two parallel networks dedicated to learn users' dynamic interests over items and scenes, and a sequence feature enhancement module to capture the interplay for enhanced future behavior prediction.
arXiv Detail & Related papers (2025-09-30T12:26:57Z) - Personalized Vision via Visual In-Context Learning [62.85784251383279]
We present a visual in-context learning framework for personalized vision.<n>PICO infers the underlying transformation and applies it to new inputs without retraining.<n>We also propose an attention-guided seed scorer that improves reliability via efficient inference scaling.
arXiv Detail & Related papers (2025-09-29T17:58:45Z) - Bridging Generalization and Personalization in Human Activity Recognition via On-Device Few-Shot Learning [16.255569673010122]
Human Activity Recognition (HAR) with different sensing modalities requires strong generalization across diverse users and efficient personalization for individuals.<n>We propose a novel on-device few-shot learning framework that bridges generalization and personalization in HAR.<n>We implement our framework on the energy-efficient RISC-V GAP9 microcontroller and evaluate it on three benchmark datasets.
arXiv Detail & Related papers (2025-08-21T10:08:20Z) - CLEAR: Unlearning Spurious Style-Content Associations with Contrastive LEarning with Anti-contrastive Regularization [4.171555557592296]
We propose Contrastive LEarning with Anti-contrastive Regularization (CLEAR)<n>CLEAR separates essential (i.e., task-relevant) characteristics from superficial (i.e., task-irrelevant) characteristics during training, leading to better performance when superficial characteristics shift at test time.<n>Our results show that CLEAR-VAE allows us to: (a) swap and interpolate content and style between any pair of samples, and (b) improve downstream classification performance in the presence of previously unseen combinations of content and style.
arXiv Detail & Related papers (2025-07-24T20:31:21Z) - Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.
Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.
We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z) - LLM-assisted Explicit and Implicit Multi-interest Learning Framework for Sequential Recommendation [50.98046887582194]
We propose an explicit and implicit multi-interest learning framework to model user interests on two levels: behavior and semantics.
The proposed EIMF framework effectively and efficiently combines small models with LLM to improve the accuracy of multi-interest modeling.
arXiv Detail & Related papers (2024-11-14T13:00:23Z) - Heterogeneous Hyper-Graph Neural Networks for Context-aware Human Activity Recognition [2.8132886759540146]
We argue that context-aware activity visit patterns in realistic in-the-wild data can be considered as a general graph representation learning task.
We propose a novel Heterogeneous HyperGraph Neural Network architecture for Context-aware Human Activity Recognition.
arXiv Detail & Related papers (2024-09-26T02:44:37Z) - Spatio-Temporal Context Prompting for Zero-Shot Action Detection [13.22912547389941]
We propose a method which can effectively leverage the rich knowledge of visual-language models to perform Person-Context Interaction.
To address the challenge of recognizing distinct actions by multiple people at the same timestamp, we design the Interest Token Spotting mechanism.
Our method achieves superior results compared to previous approaches and can be further extended to multi-action videos.
arXiv Detail & Related papers (2024-08-28T17:59:05Z) - FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimation [73.454943870226]
Language models have shown impressive in-context-learning capabilities.
We propose a measure called FamiCom, providing a more comprehensive measure for task-agnostic performance estimation.
arXiv Detail & Related papers (2024-06-17T06:14:55Z) - Continual Facial Expression Recognition: A Benchmark [3.181579197770883]
This work presents the Continual Facial Expression Recognition (ConFER) benchmark that evaluates popular CL techniques on FER tasks.
It presents a comparative analysis of several CL-based approaches on popular FER datasets such as CK+, RAF-DB, and AffectNet.
CL techniques, under different learning settings, are shown to achieve state-of-the-art (SOTA) performance across several datasets.
arXiv Detail & Related papers (2023-05-10T20:35:38Z) - On Exploring Pose Estimation as an Auxiliary Learning Task for
Visible-Infrared Person Re-identification [66.58450185833479]
In this paper, we exploit Pose Estimation as an auxiliary learning task to assist the VI-ReID task in an end-to-end framework.
By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality modality-shared and ID-related features.
Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-01-11T09:44:00Z) - Learning to Relate Depth and Semantics for Unsupervised Domain
Adaptation [87.1188556802942]
We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting.
We propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions.
Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain.
arXiv Detail & Related papers (2021-05-17T13:42:09Z) - Invariant Feature Learning for Sensor-based Human Activity Recognition [11.334750079923428]
We present an invariant feature learning framework (IFLF) that extracts common information shared across subjects and devices.
Experiments demonstrated that IFLF is effective in handling both subject and device diversion across popular open datasets and an in-house dataset.
arXiv Detail & Related papers (2020-12-14T21:56:17Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z) - Mining Implicit Entity Preference from User-Item Interaction Data for
Knowledge Graph Completion via Adversarial Learning [82.46332224556257]
We propose a novel adversarial learning approach by leveraging user interaction data for the Knowledge Graph Completion task.
Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator.
To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks.
arXiv Detail & Related papers (2020-03-28T05:47:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.