Overcoming Statistical Shortcuts for Open-ended Visual Counting
- URL: http://arxiv.org/abs/2006.10079v2
- Date: Wed, 1 Jul 2020 11:04:02 GMT
- Title: Overcoming Statistical Shortcuts for Open-ended Visual Counting
- Authors: Corentin Dancette and Remi Cadene and Xinlei Chen and Matthieu Cord
- Abstract summary: We aim to develop models that learn a proper mechanism of counting regardless of the output label.
First, we propose the Modifying Count Distribution protocol, which penalizes models that over-rely on statistical shortcuts.
Secondly, we introduce the Spatial Counting Network (SCN), which is dedicated to visual analysis and counting based on natural language questions.
- Score: 54.858754825838865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning models tend to over-rely on statistical shortcuts. These
spurious correlations between parts of the input and the output labels does not
hold in real-world settings. We target this issue on the recent open-ended
visual counting task which is well suited to study statistical shortcuts. We
aim to develop models that learn a proper mechanism of counting regardless of
the output label. First, we propose the Modifying Count Distribution (MCD)
protocol, which penalizes models that over-rely on statistical shortcuts. It is
based on pairs of training and testing sets that do not follow the same count
label distribution such as the odd-even sets. Intuitively, models that have
learned a proper mechanism of counting on odd numbers should perform well on
even numbers. Secondly, we introduce the Spatial Counting Network (SCN), which
is dedicated to visual analysis and counting based on natural language
questions. Our model selects relevant image regions, scores them with fusion
and self-attention mechanisms, and provides a final counting score. We apply
our protocol on the recent dataset, TallyQA, and show superior performances
compared to state-of-the-art models. We also demonstrate the ability of our
model to select the correct instances to count in the image. Code and datasets
are available: https://github.com/cdancette/spatial-counting-network
Related papers
- Bound Tightening Network for Robust Crowd Counting [0.3626013617212667]
We propose a novel Bound Tightening Network (BTN) for Robust Crowd Counting.
It consists of three parts: base model, smooth regularization module and certify bound module.
Experiments on different benchmark datasets for counting demonstrate the effectiveness and efficiency of BTN.
arXiv Detail & Related papers (2024-09-27T21:18:31Z) - Active Statistical Inference [14.00987234726578]
methodology uses a machine learning model to identify which data points are most beneficial to label.
It achieves the same level of accuracy with far fewer samples than existing baselines.
arXiv Detail & Related papers (2024-03-05T18:46:50Z) - Confidence-Based Model Selection: When to Take Shortcuts for
Subpopulation Shifts [119.22672589020394]
We propose COnfidence-baSed MOdel Selection (CosMoS), where model confidence can effectively guide model selection.
We evaluate CosMoS on four datasets with spurious correlations, each with multiple test sets with varying levels of data distribution shift.
arXiv Detail & Related papers (2023-06-19T18:48:15Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - Improving Zero-Shot Models with Label Distribution Priors [33.51714665243138]
We propose a new approach, CLIPPR, which adapts zero-shot models for regression and classification on unlabelled datasets.
We demonstrate an improvement of 28% in mean absolute error on the UTK age regression task.
We also present promising results for classification benchmarks, improving the classification accuracy on the ImageNet dataset by 2.83%, without using any labels.
arXiv Detail & Related papers (2022-12-01T18:59:03Z) - Leveraging Instance Features for Label Aggregation in Programmatic Weak
Supervision [75.1860418333995]
Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently.
The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy supervision sources as labeling functions.
Existing statistical label models typically rely only on the outputs of LF, ignoring the instance features when modeling the underlying generative process.
arXiv Detail & Related papers (2022-10-06T07:28:53Z) - Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition [98.25592165484737]
We propose a more effective pseudo-labeling scheme, called Cross-Model Pseudo-Labeling (CMPL)
CMPL achieves $17.6%$ and $25.1%$ Top-1 accuracy on Kinetics-400 and UCF-101 using only the RGB modality and $1%$ labeled data, respectively.
arXiv Detail & Related papers (2021-12-17T18:59:41Z) - Pre and Post Counting for Scalable Statistical-Relational Model
Discovery [19.18886406228943]
Statistical-Relational Model Discovery aims to find statistically relevant patterns in relational data.
As with propositional (non-relational) graphical models, the major scalability bottleneck for model discovery is computing instantiation counts.
This paper takes a detailed look at the memory and speed trade-offs between pre-counting and post-counting strategies for relational learning.
arXiv Detail & Related papers (2021-10-19T07:03:35Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.