CaFlow: Enhancing Long-Term Action Quality Assessment with Causal Counterfactual Flow
- URL: http://arxiv.org/abs/2511.21653v1
- Date: Wed, 26 Nov 2025 18:25:41 GMT
- Title: CaFlow: Enhancing Long-Term Action Quality Assessment with Causal Counterfactual Flow
- Authors: Ruisheng Han, Kanglei Zhou, Shuang Chen, Amir Atapour-Abarghouei, Hubert P. H. Shum,
- Abstract summary: Action Quality Assessment (AQA) predicts fine-grained execution scores from action videos.<n>Long-term AQA, as in figure skating or rhythmic gymnastics, is especially challenging since it requires modeling extended temporal dynamics.<n>We propose CaFlow, a unified framework that integrates counterfactual de-confounding with bidirectional time-conditioned flow.
- Score: 25.3923767595433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Action Quality Assessment (AQA) predicts fine-grained execution scores from action videos and is widely applied in sports, rehabilitation, and skill evaluation. Long-term AQA, as in figure skating or rhythmic gymnastics, is especially challenging since it requires modeling extended temporal dynamics while remaining robust to contextual confounders. Existing approaches either depend on costly annotations or rely on unidirectional temporal modeling, making them vulnerable to spurious correlations and unstable long-term representations. To this end, we propose CaFlow, a unified framework that integrates counterfactual de-confounding with bidirectional time-conditioned flow. The Causal Counterfactual Regularization (CCR) module disentangles causal and confounding features in a self-supervised manner and enforces causal robustness through counterfactual interventions, while the BiT-Flow module models forward and backward dynamics with a cycle-consistency constraint to produce smoother and more coherent representations. Extensive experiments on multiple long-term AQA benchmarks demonstrate that CaFlow achieves state-of-the-art performance. Code is available at https://github.com/Harrison21/CaFlow
Related papers
- Towards Stable and Structured Time Series Generation with Perturbation-Aware Flow Matching [16.17115009663765]
We introduce textbfPAFM, a framework that models perturbed trajectories to ensure stable and structurally consistent time series generation.<n>The framework incorporates perturbation-guided training to simulate localized disturbances and leverages a dual-path velocity field to capture trajectory deviations under perturbation.<n>In experiments on both unconditional and conditional generation tasks, PAFM consistently outperforms strong baselines.
arXiv Detail & Related papers (2025-11-18T13:30:56Z) - TimeFlow: Towards Stochastic-Aware and Efficient Time Series Generation via Flow Matching Modeling [2.74279932215302]
Time series data has emerged as a critical research topic due to its broad utility in supporting downstream time series mining tasks.<n>We propose TimeFlow, a novel flow matching framework that integrates a encoder-only architecture.<n>Our model consistently outperforms strong baselines in generation quality, diversity, and efficiency.
arXiv Detail & Related papers (2025-11-11T08:28:26Z) - Are Large Reasoning Models Interruptible? [77.53059044071107]
Large Reasoning Models (LRMs) excel at complex reasoning but are traditionally evaluated in static, "frozen world" settings.<n>We show that even state-of-the-art LRMs, which achieve high accuracy in static settings, can fail unpredictably when interrupted or exposed to changing context.<n>Our analysis further reveals several novel failure modes, including reasoning leakage, panic, and self-doubt.
arXiv Detail & Related papers (2025-10-13T17:59:35Z) - Continual Action Quality Assessment via Adaptive Manifold-Aligned Graph Regularization [53.82400605816587]
Action Quality Assessment (AQA) quantifies human actions in videos, supporting applications in sports scoring, rehabilitation, and skill evaluation.<n>A major challenge lies in the non-stationary nature of quality distributions in real-world scenarios.<n>We introduce Continual AQA (CAQA), which equips with Continual Learning capabilities to handle evolving distributions.
arXiv Detail & Related papers (2025-10-08T10:09:47Z) - Unified Flow Matching for Long Horizon Event Forecasting [3.0639815065447036]
We propose a unified flow matching framework for marked temporal point processes.<n>By learning continuous-time flows for both components, our method generates coherent long horizon event trajectories without sequential decoding.<n>We evaluate our model on six real-world benchmarks and demonstrate significant improvements over autoregressive and diffusion-based baselines in both accuracy and generation efficiency.
arXiv Detail & Related papers (2025-08-06T19:42:49Z) - FineCausal: A Causal-Based Framework for Interpretable Fine-Grained Action Quality Assessment [13.936546696317617]
We introduce FineusDival, a novel causal-based framework that achieves state-of-the-art performance on the Fineing-HMCa dataset.<n>Our approach leverages a Graph Attention Network-based causal intervention module to disentangle human-centric cues from background confounders.<n>Our dual-module strategy enables FineCausal to generate detailed temporal-temporal representations that not only achieve state-of-the-art scoring performance but also provide transparent, interpretable feedback on which features drive the assessment.
arXiv Detail & Related papers (2025-03-31T10:02:29Z) - FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching [56.30231216917128]
FELLE is an autoregressive model that integrates language modeling with token-wise flow matching.<n>For each continuous-valued token, FELLE modifies the general prior distribution in flow matching by incorporating information from the previous step.<n>FELLE generates continuous-valued tokens hierarchically, conditioned on the language model's output.
arXiv Detail & Related papers (2025-02-16T13:54:32Z) - Interpretable Long-term Action Quality Assessment [12.343701556374556]
Long-term Action Quality Assessment (AQA) evaluates the execution of activities in videos.
Current AQA methods produce a single score by averaging clip features.
Long-term videos pose additional difficulty due to the complexity and diversity of actions.
arXiv Detail & Related papers (2024-08-21T15:09:09Z) - ChiroDiff: Modelling chirographic data with Diffusion Models [132.5223191478268]
We introduce a powerful model-class namely "Denoising Diffusion Probabilistic Models" or DDPMs for chirographic data.
Our model named "ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate.
arXiv Detail & Related papers (2023-04-07T15:17:48Z) - BSN++: Complementary Boundary Regressor with Scale-Balanced Relation
Modeling for Temporal Action Proposal Generation [85.13713217986738]
We present BSN++, a new framework which exploits complementary boundary regressor and relation modeling for temporal proposal generation.
Not surprisingly, the proposed BSN++ ranked 1st place in the CVPR19 - ActivityNet challenge leaderboard on temporal action localization task.
arXiv Detail & Related papers (2020-09-15T07:08:59Z) - Learn to cycle: Time-consistent feature discovery for action recognition [83.43682368129072]
Generalizing over temporal variations is a prerequisite for effective action recognition in videos.
We introduce Squeeze Re Temporal Gates (SRTG), an approach that favors temporal activations with potential variations.
We show consistent improvement when using SRTPG blocks, with only a minimal increase in the number of GFLOs.
arXiv Detail & Related papers (2020-06-15T09:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.