Causal-Inspired Multitask Learning for Video-Based Human Pose Estimation
- URL: http://arxiv.org/abs/2501.14356v1
- Date: Fri, 24 Jan 2025 09:45:16 GMT
- Title: Causal-Inspired Multitask Learning for Video-Based Human Pose Estimation
- Authors: Haipeng Chen, Sifan Wu, Zhigang Wang, Yifang Yin, Yingying Jiao, Yingda Lyu, Zhenguang Liu,
- Abstract summary: We introduce a causal-temporal modeling framework consisting of two stages.
The first stage endows the model with causal-temporal modeling ability by introducing two self-supervision auxiliary tasks.
In the second stage, we argue that not all feature tokens contribute equally to pose estimation.
Our method outperforms state-of-the-art methods on three large-scale benchmark datasets.
- Score: 18.826857684901118
- License:
- Abstract: Video-based human pose estimation has long been a fundamental yet challenging problem in computer vision. Previous studies focus on spatio-temporal modeling through the enhancement of architecture design and optimization strategies. However, they overlook the causal relationships in the joints, leading to models that may be overly tailored and thus estimate poorly to challenging scenes. Therefore, adequate causal reasoning capability, coupled with good interpretability of model, are both indispensable and prerequisite for achieving reliable results. In this paper, we pioneer a causal perspective on pose estimation and introduce a causal-inspired multitask learning framework, consisting of two stages. \textit{In the first stage}, we try to endow the model with causal spatio-temporal modeling ability by introducing two self-supervision auxiliary tasks. Specifically, these auxiliary tasks enable the network to infer challenging keypoints based on observed keypoint information, thereby imbuing causal reasoning capabilities into the model and making it robust to challenging scenes. \textit{In the second stage}, we argue that not all feature tokens contribute equally to pose estimation. Prioritizing causal (keypoint-relevant) tokens is crucial to achieve reliable results, which could improve the interpretability of the model. To this end, we propose a Token Causal Importance Selection module to identify the causal tokens and non-causal tokens (\textit{e.g.}, background and objects). Additionally, non-causal tokens could provide potentially beneficial cues but may be redundant. We further introduce a non-causal tokens clustering module to merge the similar non-causal tokens. Extensive experiments show that our method outperforms state-of-the-art methods on three large-scale benchmark datasets.
Related papers
- Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.
Models may behave unreliably due to poorly explored failure modes.
causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z) - Scaling Capability in Token Space: An Analysis of Large Vision Language Model [27.59879939490807]
We investigate the relationship between the number of vision tokens and the performance on vision-language models.
We also investigate the impact of a fusion mechanism that integrates the user's question with vision tokens.
arXiv Detail & Related papers (2024-12-24T12:20:24Z) - Solving the Clustering Reasoning Problems by Modeling a Deep-Learning-Based Probabilistic Model [1.7955614278088239]
We introduce PMoC, a deep-learning-based probabilistic model, achieving high reasoning accuracy in the Bongard-Logo.
As a bonus, we also designed Pose-Transformer for complex visual abstract reasoning tasks.
arXiv Detail & Related papers (2024-03-05T18:08:29Z) - Towards Causal Foundation Model: on Duality between Causal Inference and Attention [18.046388712804042]
We take a first step towards building causally-aware foundation models for treatment effect estimations.
We propose a novel, theoretically justified method called Causal Inference with Attention (CInA)
arXiv Detail & Related papers (2023-10-01T22:28:34Z) - Causal Triplet: An Open Challenge for Intervention-centric Causal
Representation Learning [98.78136504619539]
Causal Triplet is a causal representation learning benchmark featuring visually more complex scenes.
We show that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts.
arXiv Detail & Related papers (2023-01-12T17:43:38Z) - Fairness Increases Adversarial Vulnerability [50.90773979394264]
This paper shows the existence of a dichotomy between fairness and robustness, and analyzes when achieving fairness decreases the model robustness to adversarial samples.
Experiments on non-linear models and different architectures validate the theoretical findings in multiple vision domains.
The paper proposes a simple, yet effective, solution to construct models achieving good tradeoffs between fairness and robustness.
arXiv Detail & Related papers (2022-11-21T19:55:35Z) - Towards Improving Faithfulness in Abstractive Summarization [37.19777407790153]
We propose a Faithfulness Enhanced Summarization model (FES) to improve fidelity in abstractive summarization.
Our model outperforms strong baselines in experiments on CNN/DM and XSum.
arXiv Detail & Related papers (2022-10-04T19:52:09Z) - Task Formulation Matters When Learning Continually: A Case Study in
Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge.
We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z) - Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances.
We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z) - Dependent Multi-Task Learning with Causal Intervention for Image
Captioning [10.6405791176668]
In this paper, we propose a dependent multi-task learning framework with the causal intervention (DMTCI)
Firstly, we involve an intermediate task, bag-of-categories generation, before the final task, image captioning.
Secondly, we apply Pearl's do-calculus on the model, cutting off the link between the visual features and possible confounders.
Finally, we use a multi-agent reinforcement learning strategy to enable end-to-end training and reduce the inter-task error accumulations.
arXiv Detail & Related papers (2021-05-18T14:57:33Z) - Understanding Neural Abstractive Summarization Models via Uncertainty [54.37665950633147]
seq2seq abstractive summarization models generate text in a free-form manner.
We study the entropy, or uncertainty, of the model's token-level predictions.
We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.
arXiv Detail & Related papers (2020-10-15T16:57:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.