Mitigating and Evaluating Static Bias of Action Representations in the
Background and the Foreground
- URL: http://arxiv.org/abs/2211.12883v3
- Date: Wed, 27 Sep 2023 09:44:29 GMT
- Title: Mitigating and Evaluating Static Bias of Action Representations in the
Background and the Foreground
- Authors: Haoxin Li, Yuan Liu, Hanwang Zhang, Boyang Li
- Abstract summary: Shortcut static features can interfere with the learning of motion features, resulting in poor out-of-distribution generalization.
In this paper, we empirically verify the existence of foreground static bias by creating test videos with conflicting signals from the static and moving portions of the video.
StillMix identifies bias-inducing video frames using a 2D reference network and mixes them with videos for training, serving as effective bias suppression.
- Score: 59.916365866505636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In video action recognition, shortcut static features can interfere with the
learning of motion features, resulting in poor out-of-distribution (OOD)
generalization. The video background is clearly a source of static bias, but
the video foreground, such as the clothing of the actor, can also provide
static bias. In this paper, we empirically verify the existence of foreground
static bias by creating test videos with conflicting signals from the static
and moving portions of the video. To tackle this issue, we propose a simple yet
effective technique, StillMix, to learn robust action representations.
Specifically, StillMix identifies bias-inducing video frames using a 2D
reference network and mixes them with videos for training, serving as effective
bias suppression even when we cannot explicitly extract the source of bias
within each video frame or enumerate types of bias. Finally, to precisely
evaluate static bias, we synthesize two new benchmarks, SCUBA for static cues
in the background, and SCUFO for static cues in the foreground. With extensive
experiments, we demonstrate that StillMix mitigates both types of static bias
and improves video representations for downstream applications. Code is
available at https://github.com/lihaoxin05/StillMix.
Related papers
- ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition [52.537021302246664]
Action recognition models often suffer from background bias (i.e., inferring actions based on background cues) and foreground bias (i.e., relying on subject appearance)
We propose ALBAR, a novel adversarial training method that mitigates foreground and background biases without requiring specialized knowledge of the bias attributes.
We evaluate our method on established background and foreground bias protocols, setting a new state-of-the-art and strongly improving combined debiasing performance by over 12% on HMDB51.
arXiv Detail & Related papers (2025-01-31T20:47:06Z) - Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy
for Temporal Sentence Grounding in Video [67.24316233946381]
Temporal Sentence Grounding in Video (TSGV) is troubled by dataset bias issue.
We propose the bias-conflict sample synthesis and adversarial removal debias strategy (BSSARD)
arXiv Detail & Related papers (2024-01-15T09:59:43Z) - Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal
Intervention [72.12974259966592]
We present a unique and systematic study of a temporal bias due to frame length discrepancy between training and test sets of trimmed video clips.
We propose a causal debiasing approach and perform extensive experiments and ablation studies on the Epic-Kitchens-100, YouCook2, and MSR-VTT datasets.
arXiv Detail & Related papers (2023-09-17T15:58:27Z) - SOAR: Scene-debiasing Open-set Action Recognition [81.8198917049666]
We propose Scene-debiasing Open-set Action Recognition (SOAR), which features an adversarial scene reconstruction module and an adaptive adversarial scene classification module.
The former prevents the decoder from reconstructing the video background given video features, and thus helps reduce the background information in feature learning.
The latter aims to confuse scene type classification given video features, with a specific emphasis on the action foreground, and helps to learn scene-invariant information.
arXiv Detail & Related papers (2023-09-03T20:20:48Z) - Quantifying and Learning Static vs. Dynamic Information in Deep Spatiotemporal Networks [29.47784194895489]
Action recognition, automatic video object segmentation (AVOS) and video instance segmentation (VIS) are studied.
Most examined models are biased toward static information.
Some datasets that are assumed to be biased toward dynamics are actually biased toward static information.
arXiv Detail & Related papers (2022-11-03T13:17:53Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised
Video Representation Learning [7.27708818665289]
We propose a novel method to suppress static visual cues (SSVC) based on probabilistic analysis for self-supervised video representation learning.
By modelling static factors in a video as a random variable, the conditional distribution of each latent variable becomes shifted and scaled normal.
Finally, positive pairs are constructed by motion-preserved videos for contrastive learning to alleviate the problem of representation bias to static cues.
arXiv Detail & Related papers (2021-12-07T16:21:22Z) - Motion-aware Self-supervised Video Representation Learning via
Foreground-background Merging [19.311818681787845]
We propose Foreground-background Merging (FAME) to compose the foreground region of the selected video onto the background of others.
We show that FAME can significantly boost the performance in different downstream tasks with various backbones.
arXiv Detail & Related papers (2021-09-30T13:45:26Z) - VideoMix: Rethinking Data Augmentation for Video Classification [29.923635550986997]
State-of-the-art video action classifiers often suffer from overfitting.
Recent data augmentation strategies have been reported to address the overfitting problems.
VideoMix lets a model learn beyond the object and scene biases and extract more robust cues for action recognition.
arXiv Detail & Related papers (2020-12-07T05:40:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.